Kickstarter Campaign
Support us in
Kickstarter Campaign

Storage - OwnCloud

256mb RAM 140MB HDD /PHP


Choosen frontend it's OwnCloud.


As frontend functionality should be as functional as Copy or Dropbox.


Interface covers user standard user needs. At least:


- Nice intuitive interface (ajax, drag&drop interface)

- Upload/Download files from your computer via WebUI

- Public link manager: it allow to create or delete public links to files or folders

- File changes history

- Restore deleted files

- Encrypted storage

- Share files with a link. You can include decryption key in the URL (like mega)

- Share folders with a link. You can include decryption key in the URL (like mega)

- Encryption JavaScript on client side. Before data goes out of your computer, it should be encrypted, to prevent man-in-the-middle or other hacking issues.

- Encryption on server side. Data stored it’s encrypted and nobody than you can access this data.

- Configure upload transfer rate

- File preview on file manager (without enter to another page)



First Start


First time you are login to frontend it should:


- Create a user & administator

- Create secure password (checked by a secure metter)



Tracking Owncloud model


OwnCloud has 2 tracking methods, that we should block


- It's trying to connect to some domain (probably www.owncloud.org)

- It's sending code to browser, and browser it's connecting to tracking page



Encryption proposed model – General Overview

Security encryption has 3 layers. It's well known that more layers does not add more security, but each encryption protects in a step of process.


On file transfers, file it's encrypted before leaves browser (on upload) or when it arribes to browser (on download).. Original filename it's used, but encrypted content.


On file storage, OwnCloud encrypts it based user key. And stored in a DM-Crypt



Upload File model


When you upload a file to storage it should:


1. Remove critical file metadata before leave browser

2. Encrypt it with JavaScript before leaves the browser (if it's checked on upload form, or user configuration. It's called «secure mode»)



Upload/Download Client Browser JavaScript Encryption


On upload or download, or collaboration file, it asks you password to access your file, if it's configured to be in «secure mode». Password never leaves your browser.



In this way,

- Onwcloud doesn't know what's inside your files

- Each file can have a different password to be decrypted (usefull to share files with other people)

- And can be running without «secure mode» in order to be less secure but more usable, for a regular user.



OwnCloud Bug Fixing

As all softwares, OwnCloud has some error sreported on bugtracker to be solved, too. All this list it's included on development OwnCloud Appendix 3.1



OwnCloud Implementation Step 1

In a first implementation phase, OwnCloud loose collaborative online ocument editing features for files in «secure mode» (wiles which where encrypted in client-side javascript ).


OwnCloud Implementation Step 2

In a second implementation phase, OwnCloud get again collaborative online document editing features for files in «secure mode», by developing new P2P connection for collaborative platform. Described in Service 7.



Share Link proposal


When user creates a link to share a document, it should be possible to add password to decrypt file without asking any password, like mega, if this file it's in «secure mode»


When User 2 goes to shared link, it goes to the User 1 frontend and downloads the file.

When it’s completly downloaded, browser javascript asks for the decrypting password if it's needed and if isn’t included on the link.

If decrypting password it’s included on the link, decrypting password would never leave your browser, or log on the HTTP server Request URL



File Indexation


When a file isn't on «secure mode» OwnCloud index it, the filename (as is doing now) and its content.


When user it's searching for a file, can choose if want to search only by filename, by content, or both.


Specification

Dropbox - GDrive - Mega OwnCloud
Upload files from your computer via WebUI X
Sync a folder of your computer via Agent X
Link manager (remove download links to be private again) X
Filesystem management from WebUI X
Separated view of uploaded images X
Filechanges history X
Feed RSS of file changes X
Collaborative online editing X
Restore and download deleted files X
Encrypted storage X
Mobile applications. Download and steam your files X
API for 3rd parties plugins X
WebUI in lot of languages X
List of recent added/modified files X
Firefox extension X
Create collaborative text document X
Share files with a link
(including decryption key on URL or not)
X (without decryption key)
Share folders with a link
(including decryption key on URL or not)
X (without decryption key)
Share files by mail
(including decryption key on URL or not)
 
Share folders by mail
(including decryption key on URL or not)
 
Encryption on client side and server side Server Side
Encryption key management  
2048-bit RSA (or better) ?
RSA Key based on you password + entropy ?
Configure upload tranfer rate  
Lost password (master crypto key) = data loss X (but it have a master key)
Create collaborative presentation  
Create collaborative spreadsheet  
Create collaborative form/poll  
Create collaborative draw  
Show file preferences (which apps can open, owners, editors, etc)  
Stores cache on web browser. It loads faster  
File preview on file manager (without enter to another page)  
Owner list  
Connection history  
Mark documents as favorites  
File manager view files in grid  
File manager view files in list  
Statistics hardisk space  
Statistics used bandwidth  
Filechanges history search by date  
Drag'n'Drop files to Upload X
Option to not keep history  
Transfer folder to any other user  
List connected devices  
Resume file upload by WebUI  
Resume file upload by Agent  
Create public upload filders  
Share folders, files and links with other users  
Create user groups  
Applications (bookmarks, IM, music, etc) X
Nice intuitive interface X
Metatag system to improve finding system  
Resume filetransfers via WebUi  

Development Plan

Total develop Hours 377h Cost: 15080€

First Month (bug month) Hours: 172,5h Cost: 6900€
* Incomplete MP4 Files Download 6,25h 250€
* Share with expiration 7,5h 300€
* Share with group shares back to original user 5h 200€
* (Un-)Share a file after another user edited this 6,25h 250€
* Rename folder while files are uploading to it fails 6,25h 250€
* Delete shared folder while uploading 5h 200€
* inconsistancy in uploading 0 byte file 5h 200€
* Server to server sharing is not working properly with single files and documents 6,25h 250€
* Moving data directory causes redownload and lost of shares 5h 200€
* gzip decompess fails on some archives 6,25h 250€
* Warning on CLI password change if encryption is enabled 5h 200€
* Can't activate apps when using SSL 7,5h 300€
* Cannot jump to the folder from search result for shared folders/files 6,25h 250€
* E-mail notification bug (returns /var/www directory inside e-mail) 5h 200€
* Shared video freezes web browser (FireFox, Chrome, etc) 3,75h 150€
* Encryption not working. 6,25h 250€
* Opening Shared MS *.doc files fail on normal users. 2,5h 100€
* Disable "Share files" breaks files app 7,5h 300€
* Sending password reset eMail doesn't work 6,25h 250€
* OC6 Web Interface Freezing 8,75h 350€
* Mounted ftps doesn't show all the files and directories 6,25h 250€
* User files are readable by administrator even when Password recovery is disabled by user 6,25h 250€
* SMB/CIFS password visible in owncloud.log 3,75h 150€
* oc7 deleting user data! 6,25h 250€
* Fix username change [WIP 2,5h 100€
* force loading of encryption app to show correct error 6,25h 250€
* No File Size shown for files bigger than 2 GB with FTP and "External storage support" 6,25h 250€
* sharing files don't decrypt 3,75h 150€
* Deleted Files Recovery doesn't work 6,25h 250€
* Logout don't destroy $_SESSION 3,75h 150€
* Password popup 3,75h 150€

Second Month Hours: 112h Cost: 4480€
* Can't Change Full Name for Users 3,75h 150€
* OC7.0.0 Cannot upload 2 directories through Drag and Drop 6,25h 250€
* A folder shared with two groups appears twice for an user in these two groups. 7,5h 300€
* rd level files/folders looking like they do not inherit privileges/permissions 6,25h 250€
* "Pictures" view mode bug with folder ending by a space 6,25h 250€
* [7.0.1] Confusion with share to user name 6,25h 250€
* [7.0.1] Drag and drop folder gives no error 7,5h 300€
* UI improvements for external storage configuration 5h 200€
* Folder specific views 6,25h 250€
* Show that the encryption recovery key password is set (usability) 5h 200€
* restoring deleted shares 5h 200€
* Seamless integration with Libreoffice 60h 2400€

Third Month Hours: 92,5h Cost: 3700€
* Share files with a link
(including decryption key on URL or not)
Now cannot include decryption key on URL
6,25h 250€
* Share folders with a link
(including decryption key on URL or not)
Now cannot include decryption key on URL
6,25h 250€
* Share files by mail
(including decryption key on URL or not)
7,5h 300€
* Share folders by mail
including decryption key on URL or not)
6,25h 250€
* Configure upload transfer rate 7,5h 300€
* Stores cache on web browser to load webpage faster 6,25h 250€
* Statistics hardisk space 3,75h 150€
* JS to encrypt on webbrowser files before upload 6,25h 250€
* List connected devices 5h 200€
* Resume file upload on WebUI 10h 400€
* Create public upload folders with choosen criteria \\(not upload more than 1gb or whatever) 7,5h 300€
* Share folders, files and links with other OwnCloud users 8,75h 350€
* Resume file downloads on WebUi 8,75h 350€
* Manage password & encryption keys 5h 200€
* Private password for restore 3,75h 150€

Decentralized Backup - I2P+Tahoe-LAFS

50mb ram 12mb hdd Python



Introduction

This document addresses the analysis of i2p-Tahoe-LAFS version 1.10 in order to implement three new features:

  • Quota management

  • Connection to multiple Helpers

  • Automatic spreading of Introducers & Helpers furls.


We begin with a short introduction to Tahoe-LAFS, and then proceed to analyse the requirements for the 3 proposed features. The analysis includes a review of how related functionality is now implemented in Tahoe-LAFS, which files should be modified and what modifications should be included for each of those files.


Short introduction to Tahoe-LAFS architecture

As a short reminder, Tahoe-LAFS grid is composed of several types of nodes:


  • Introducer: keeps track of StorageServer nodes connected to the grid and publishes it so that StorageClients know which are the nodes they can connect to.

  • StorageServer: form the distributed data store.

  • HelperServer: a intermediate server which can be used to minimize upload time. Due to the redundancy introduced by erasure coding, uploading a file to the grid can be an order of magnitude slower than reading from it. The HelperServer acts as a proxy which receives the encrypted data from the StorageClient (encrypted, but with no redundancy), performs the erasure encoding and distributes files to StorageServers in the grid.

  • StorageClient: once they get the list of StorageServers in the grid from one introducer, they can connect to read and write data on the grid. Read operations are performed connecting directly to StorageServer nodes. Write operations can be performed connecting directly or using a HelperServer (only for immutable files as of Tahoe-LAFS 1.10.0).


For a full introduction to Tahoe-LAFS, see the docs folder on source tree. You can also check the tutorial published on Nilestore project's wiki1.



Diagram showing tahoe-lafs network topology (from tahoe-lafs official documentation). (Notice that Introducers and Helpers are not shown in it)




Code structure in Tahoe-LAFS

Tahoe-LAFS is developed in Python (2.6.6 – 2.x), and has a great test code coverage (around 92% for 1.10). In this paragraph we make a short description of Tahoe-LAFS source code.

We start by looking at Tahoe-LAFS source folder structure:

allmydata

├── frontends

├── immutable

│   └── downloader

├── introducer

├── mutable

├── scripts

├── storage

├── test

├── util

├── web

│   └── static

│   ├── css

│   └── img

└── windows

As a general rule, code specific for each feature's Client and Server is placed under that feature's folder, as client.py and server.py. All test files are placed under test folder.



Some files relevant to the rest of the document2:

  • allmydata/client.py: this is the main file for the Tahoe-LAFS, contains the Client class which initializes most of the services (StorageFarmBroker, StorageServer, Web/ftp/sFtp frontends, Helper...)

  • allmydata/introducer/server.py: the server side of the Introducer.

  • allmydata/introducer/client.py: the client side of the Introducer.

  • storage/server.py the server side of the storage.

  • allmydata/immutable/upload.py manages connections to the Helper from the client side.

  • allmydata/immutable/offloaded.py the Helper, server side

  • allmydata/storage_client.py functions related to the storage client.


Analysis of proposed features

Feature 1: Quotas

Introduction

Support for quota management ('accounting') in Tahoe-LAFS has been an ongoing development for several years. The schema being used is based on the use of accounts, which could be managed by a central AuthorityServer or independently by each of the StorageServers (this option being suited only for smaller grids). A detailed description of the intended accounting architecture and development roadmap can be found in the project's documentation3.

The objective of quota management in CommunityCube is to ensure that a user which contributes to the grid with a given amount space can use the equivalent of that amount in it.

User accounts pose obvious risks regarding privacy/anonymity concerns. We have thus investigated a different approach to the problem: control quota management from the StorageClient itself.

This implementation comes, however, with its own set of drawbacks: it can be easily defeated by using a modified StorageClient and it requires to keep a local record of files stored in the grid4 or (something Tahoe-LAFS does not require as long as you keep a copy of the capabilities you are interested in), which is also a big threat from the privacy point of view. As an alternative to keeping a record of every uploaded file, users can be forced to use a single directory as root for all the files they upload (which is known as a rootcap5). The content under that directory can be accounted with a call to 'tahoe deep-check --add-lease ALIAS:', where ALIAS stands for the alias of the rootcap directory.

This approach seems to be the most compatible with CommunityCube's objectives, and its adoption relies on the belief that CommunityCube's users will be 'fair-play' to the rest of the community members.

The proposed system can be easily bypassed by malicious actors, but it will however ensure that the grid is not abused due to user mistakes or lack of knowledge on the grid's working principles and capacity.

Description of proposed feature

Quota management will be handled by the StorageClient, which imposes the limits on what can be uploaded to the grid. When a file is to be uploaded, the StorageClient:

      1. Checks that the storage server is running and writable

      2. Calculates the space it is sharing in the associated storage server.

        1. Available disk space

        2. Reserved disk space (minimum free space to be reserved)

        3. Size of stored shares

      3. The size of leases it holds on files stored on grid (requires a catalog of uploaded files and lease expiration/renewal tracking).

      4. Estimates the assigned space as 'Sharing space (available + stored shares)'

      5. Checks that Used space (i.e. sum of leases) is smaller than 'Sharing space'.

      6. Retrieve the grid's “X out K” parameters used in erasure encoding.

      7. Verifies that predicted used space and reports an error if the available quota is exceeded.

Existing code analysis

We will have a look at how the following functionality is implemented in Tahoe-LAFS:

  1. The upload of a file (to the Helper or directly to other StorageServers via the StorageFarmBroker).

  2. Check if the StorageServer is running.

  3. The statistics associated with the space used and available on the StorageServer.

  4. The moment the leases are renewed in remote StorageServers.

In the next paragraphs we show how the system works with the corresponding code.

  1. The upload of a file

The upload takes place at different classes depending on the type of data being uploaded. For immutable files, it is the Uploader service, which is defined in allmydata/immutable/upload.py. For mutable files, it is defined in allmydata/mutable/filenode.py.

These functions can be accessed from the main client, using an intermediate call to a NodeMaker instance, or directly calling the uploader:

File: allmydata/client.py


class Client(node.Node, pollmixin.PollMixin):


(…)


# these four methods are the primitives for creating filenodes and

# dirnodes. The first takes a URI and produces a filenode or (new-style)

# dirnode. The other three create brand-new filenodes/dirnodes.


def create_node_from_uri(self, write_uri, read_uri=None, deep_immutable=False, name="<unknown name>"):

# This returns synchronously.

# Note that it does *not* validate the write_uri and read_uri; instead we

# may get an opaque node if there were any problems.

return self.nodemaker.create_from_cap(write_uri, read_uri, deep_immutable=deep_immutable, name=name)


def create_dirnode(self, initial_children={}, version=None):

d = self.nodemaker.create_new_mutable_directory(initial_children, version=version)

return d


def create_immutable_dirnode(self, children, convergence=None):

return self.nodemaker.create_immutable_directory(children, convergence)


def create_mutable_file(self, contents=None, keysize=None, version=None):

return self.nodemaker.create_mutable_file(contents, keysize,

version=version)


def upload(self, uploadable):

uploader = self.getServiceNamed("uploader")

return uploader.upload(uploadable)




File: allmydata/nodemaker.py


class NodeMaker:

implements(INodeMaker)


(…)


def create_mutable_file(self, contents=None, keysize=None, version=None):

if version is None:

version = self.mutable_file_default

n = MutableFileNode(self.storage_broker, self.secret_holder,

self.default_encoding_parameters, self.history)

d = self.key_generator.generate(keysize)

d.addCallback(n.create_with_keys, contents, version=version)

d.addCallback(lambda res: n)

return d


def create_new_mutable_directory(self, initial_children={}, version=None):

# initial_children must have metadata (i.e. {} instead of None)

for (name, (node, metadata)) in initial_children.iteritems():

precondition(isinstance(metadata, dict),

"create_new_mutable_directory requires metadata to be a dict, not None", metadata)

node.raise_error()

d = self.create_mutable_file(lambda n:

MutableData(pack_children(initial_children,

n.get_writekey())),

version=version)

d.addCallback(self._create_dirnode)

return d


def create_immutable_directory(self, children, convergence=None):

if convergence is None:

convergence = self.secret_holder.get_convergence_secret()

packed = pack_children(children, None, deep_immutable=True)

uploadable = Data(packed, convergence)

d = self.uploader.upload(uploadable)

d.addCallback(lambda results:

self.create_from_cap(None, results.get_uri()))

d.addCallback(self._create_dirnode)

return d






File: allmydata/immutable/upload.py


class Uploader(service.MultiService, log.PrefixingLogMixin):

"""I am a service that allows file uploading. I am a service-child of the

Client.

"""

(...)


def upload(self, uploadable):

"""

Returns a Deferred that will fire with the UploadResults instance.

"""

assert self.parent

assert self.running


uploadable = IUploadable(uploadable)

d = uploadable.get_size()

def _got_size(size):

default_params = self.parent.get_encoding_parameters()

precondition(isinstance(default_params, dict), default_params)

precondition("max_segment_size" in default_params, default_params)

uploadable.set_default_encoding_parameters(default_params)


if self.stats_provider:

self.stats_provider.count('uploader.files_uploaded', 1)

self.stats_provider.count('uploader.bytes_uploaded', size)


if size <= self.URI_LIT_SIZE_THRESHOLD:

uploader = LiteralUploader()

return uploader.start(uploadable)

else:

eu = EncryptAnUploadable(uploadable, self._parentmsgid)

d2 = defer.succeed(None)

storage_broker = self.parent.get_storage_broker()

if self._helper:

uploader = AssistedUploader(self._helper, storage_broker)

d2.addCallback(lambda x: eu.get_storage_index())

d2.addCallback(lambda si: uploader.start(eu, si))

else:

storage_broker = self.parent.get_storage_broker()

secret_holder = self.parent._secret_holder

uploader = CHKUploader(storage_broker, secret_holder)

d2.addCallback(lambda x: uploader.start(eu))


self._all_uploads[uploader] = None

if self._history:

self._history.add_upload(uploader.get_upload_status())

def turn_verifycap_into_read_cap(uploadresults):

# Generate the uri from the verifycap plus the key.

d3 = uploadable.get_encryption_key()

def put_readcap_into_results(key):

v = uri.from_string(uploadresults.get_verifycapstr())

r = uri.CHKFileURI(key, v.uri_extension_hash, v.needed_shares, v.total_shares, v.size)

uploadresults.set_uri(r.to_string())

return uploadresults

d3.addCallback(put_readcap_into_results)

return d3

d2.addCallback(turn_verifycap_into_read_cap)

return d2

d.addCallback(_got_size)

def _done(res):

uploadable.close()

return res

d.addBoth(_done)

return d




We have highlighted the callback that will start the upload Upload._got_size and the three available ways to upload immutable content: with a LiteralUploader for small files, with a Helper or directly with the StorageFarmBroker.

In the case of mutable files, we have to check the moment when we upload a new file and when we want to modify it (of fully overwrite it via MutableFileNode.overwrite or MutableFileNode.update):



File: allmydata/mutable/filenode.py


class MutableFileNode:

implements(IMutableFileNode, Icheckable)

implements(IMutableFileNode, ICheckable)


def __init__(self, storage_broker, secret_holder,

default_encoding_parameters, history):

self._storage_broker = storage_broker

self._secret_holder = secret_holder

self._default_encoding_parameters = default_encoding_parameters

self._history = history

self._pubkey = None # filled in upon first read

self._privkey = None # filled in if we're mutable

# we keep track of the last encoding parameters that we use. These

# are updated upon retrieve, and used by publish. If we publish

# without ever reading (i.e. overwrite()), then we use these values.

self._required_shares = default_encoding_parameters["k"]

self._total_shares = default_encoding_parameters["n"]

self._sharemap = {} # known shares, shnum-to-[nodeids]

self._most_recent_size = None

(...)

def create_with_keys(self, (pubkey, privkey), contents,

version=SDMF_VERSION):

"""Call this to create a brand-new mutable file. It will create the

shares, find homes for them, and upload the initial contents (created

with the same rules as IClient.create_mutable_file() ). Returns a

Deferred that fires (with the MutableFileNode instance you should

use) when it completes.

"""

self._pubkey, self._privkey = pubkey, privkey

pubkey_s = self._pubkey.serialize()

privkey_s = self._privkey.serialize()

self._writekey = hashutil.ssk_writekey_hash(privkey_s)

self._encprivkey = self._encrypt_privkey(self._writekey, privkey_s)

self._fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)

if version == MDMF_VERSION:

self._uri = WriteableMDMFFileURI(self._writekey, self._fingerprint)

self._protocol_version = version

elif version == SDMF_VERSION:

self._uri = WriteableSSKFileURI(self._writekey, self._fingerprint)

self._protocol_version = version

self._readkey = self._uri.readkey

self._storage_index = self._uri.storage_index

initial_contents = self._get_initial_contents(contents)

return self._upload(initial_contents, None)


(…)


def overwrite(self, new_contents):

"""

I overwrite the contents of the best recoverable version of this

mutable file with new_contents. This is equivalent to calling

overwrite on the result of get_best_mutable_version with

new_contents as an argument. I return a Deferred that eventually

fires with the results of my replacement process.

"""

# TODO: Update downloader hints.

return self._do_serialized(self._overwrite, new_contents)


(…)


def upload(self, new_contents, servermap):

"""

I overwrite the contents of the best recoverable version of this

mutable file with new_contents, using servermap instead of

creating/updating our own servermap. I return a Deferred that

fires with the results of my upload.

"""

# TODO: Update downloader hints

return self._do_serialized(self._upload, new_contents, servermap)



def modify(self, modifier, backoffer=None):

"""

I modify the contents of the best recoverable version of this

mutable file with the modifier. This is equivalent to calling

modify on the result of get_best_mutable_version. I return a

Deferred that eventually fires with an UploadResults instance

describing this process.

"""

# TODO: Update downloader hints.

return self._do_serialized(self._modify, modifier, backoffer)







In addition to the relevant functions we have also highlighted the values for k and n, which are required to estimate how much disk space will take a new file.



  1. Check if the StorageServer is running.

The StorageServer is initialized in allmydata/client.py, in function Client.init_storage, according to configuration values.

File: allmydata/client.py


class Client(node.Node, pollmixin.PollMixin):


(…)


def init_storage(self):

# should we run a storage server (and publish it for others to use)?

if not self.get_config("storage", "enabled", True, boolean=True):

return

readonly = self.get_config("storage", "readonly", False, boolean=True)


storedir = os.path.join(self.basedir, self.STOREDIR)


data = self.get_config("storage", "reserved_space", None)

try:

reserved = parse_abbreviated_size(data)

except ValueError:

log.msg("[storage]reserved_space= contains unparseable value %s"

% data)

raise

if reserved is None:

reserved = 0

(…)

ss = StorageServer(storedir, self.nodeid,

reserved_space=reserved,

discard_storage=discard,

readonly_storage=readonly,

stats_provider=self.stats_provider,

expiration_enabled=expire,

expiration_mode=mode,

expiration_override_lease_duration=o_l_d,

expiration_cutoff_date=cutoff_date,

expiration_sharetypes=expiration_sharetypes)

self.add_service(ss)


d = self.when_tub_ready()

# we can't do registerReference until the Tub is ready

def _publish(res):

furl_file = os.path.join(self.basedir, "private", "storage.furl").encode(get_filesystem_encoding())

furl = self.tub.registerReference(ss, furlFile=furl_file)

ann = {"anonymous-storage-FURL": furl,

"permutation-seed-base32": self._init_permutation_seed(ss),

}


current_seqnum, current_nonce = self._sequencer()


for ic in self.introducer_clients:

ic.publish("storage", ann, current_seqnum, current_nonce, self._node_key)


d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="aLGBKw")




To find out if the StorageServer is running we have to recover the parent of the service we are at (i.e. Uploader ). We will be working with services which are 'children' of main Client instance, and we can check if the client is running a given service (i.e. the storage service) as it is done in allmydata/web/root.py:

File: allmydata/web/root.py


class Root(rend.Page):

(...)

def __init__(self, client, clock=None, now=None):

(...)

try:

s = client.getServiceNamed("storage")

except KeyError:

s = None

(...)



  1. The statistics associated with the space used and available on the StorageServer.

From the StorageServer service we get access to the StorageServer.get_stats function:




class StorageServer(service.MultiService, Referenceable):


(…)


def get_stats(self):

# remember: RIStatsProvider requires that our return dict

# contains numeric values.

stats = { 'storage_server.allocated': self.allocated_size(), }

stats['storage_server.reserved_space'] = self.reserved_space

for category,ld in self.get_latencies().items():

for name,v in ld.items():

stats['storage_server.latencies.%s.%s' % (category, name)] = v


try:

disk = fileutil.get_disk_stats(self.sharedir, self.reserved_space)

writeable = disk['avail'] > 0


# spacetime predictors should use disk_avail / (d(disk_used)/dt)

stats['storage_server.disk_total'] = disk['total']

stats['storage_server.disk_used'] = disk['used']

stats['storage_server.disk_free_for_root'] = disk['free_for_root']

stats['storage_server.disk_free_for_nonroot'] = disk['free_for_nonroot']

stats['storage_server.disk_avail'] = disk['avail']

except AttributeError:

writeable = True

except EnvironmentError:

log.msg("OS call to get disk statistics failed", level=log.UNUSUAL)

writeable = False


if self.readonly_storage:

stats['storage_server.disk_avail'] = 0

writeable = False


stats['storage_server.accepting_immutable_shares'] = int(writeable)

s = self.bucket_counter.get_state()

bucket_count = s.get("last-complete-bucket-count")

if bucket_count:

stats['storage_server.total_bucket_count'] = bucket_count

return stats




  1. The leases held by the StorageClient, and their equivalent size on disk (i.e. the amount of storage we have spent).

Leases are created whenever we upload a new file, and they are renewed from the client at three points: in immutable/checker.py (lease renewal for immutable files), in mutable/servermap.py (called from mutable/checker.py, lease renewal for mutable files) and in scripts/tahoe_check.py (cli interface).

File: allmydata/immutable/checker.py


class Checker(log.PrefixingLogMixin):

"""I query all servers to see if M uniquely-numbered shares are

available.


(…)


def _get_buckets(self, s, storageindex):

"""Return a deferred that eventually fires with ({sharenum: bucket},

serverid, success). In case the server is disconnected or returns a

Failure then it fires with ({}, serverid, False) (A server

disconnecting or returning a Failure when we ask it for buckets is

the same, for our purposes, as a server that says it has none, except

that we want to track and report whether or not each server

responded.)"""


rref = s.get_rref()

lease_seed = s.get_lease_seed()

if self._add_lease:

renew_secret = self._get_renewal_secret(lease_seed)

cancel_secret = self._get_cancel_secret(lease_seed)

d2 = rref.callRemote("add_lease", storageindex,

renew_secret, cancel_secret)

d2.addErrback(self._add_lease_failed, s.get_name(), storageindex)


(...)



File: allmydata/mutable/servermap.py


class ServermapUpdater:

def __init__(self, filenode, storage_broker, monitor, servermap,

mode=MODE_READ, add_lease=False, update_range=None):

"""I update a servermap, locating a sufficient number of useful

shares and remembering where they are located.


"""


(…)


def _do_read(self, server, storage_index, shnums, readv):

ss = server.get_rref()

if self._add_lease:

# send an add-lease message in parallel. The results are handled

# separately. This is sent before the slot_readv() so that we can

# be sure the add_lease is retired by the time slot_readv comes

# back (this relies upon our knowledge that the server code for

# add_lease is synchronous).

renew_secret = self._node.get_renewal_secret(server)

cancel_secret = self._node.get_cancel_secret(server)

d2 = ss.callRemote("add_lease", storage_index,

renew_secret, cancel_secret)

# we ignore success

d2.addErrback(self._add_lease_failed, server, storage_index)

d = ss.callRemote("slot_readv", storage_index, shnums, readv)

return d

(...)



File: allmydata/scripts/tahoe_check.py


def check_location(options, where):

stdout = options.stdout

stderr = options.stderr

nodeurl = options['node-url']

if not nodeurl.endswith("/"):

nodeurl += "/"

try:

rootcap, path = get_alias(options.aliases, where, DEFAULT_ALIAS)

except UnknownAliasError, e:

e.display(stderr)

return 1

if path == '/':

path = ''

url = nodeurl + "uri/%s" % urllib.quote(rootcap)

if path:

url += "/" + escape_path(path)

# todo: should it end with a slash?

url += "?t=check&output=JSON"

if options["verify"]:

url += "&verify=true"

if options["repair"]:

url += "&repair=true"

if options["add-lease"]:

url += "&add-lease=true"


resp = do_http("POST", url)

if resp.status != 200:

print >>stderr, format_http_error("ERROR", resp)

return 1

jdata = resp.read()

if options.get("raw"):

stdout.write(jdata)

stdout.write("\n")

return 0

data = simplejson.loads(jdata)


Required functionality per module

Storage Client:

File: allmydata/client.py

Introduce code in functions used to create new nodes to keep track of files uploaded to the grid. It may be required to move this accounting code down to the immutable/Uploader.upload function or the mutable/MutableFileNode.update/overwrite functions if they are called directly from other parts of Tahoe-LAFS (not exclusively from the client). Alternatively, if we are using the single rootcap strategy, force any new file to lie under the rootcap.

Create a new function in client that recovers the StorageServer service, access its usage statistics, the erasure encoding parameters and the statistics for uploaded files to estimate the remaining quota.


Files: immutable/checker.py, mutable/servermap.py, scripts/tahoe_check.py

Introduce accounting of the times a lease is renewed against the database of uploaded files (if we are creating a local database, this would not be required if we are using the single root dir).


Web frontend

File: allmydata/web/root.py

Add functionality to show the updated quota data.

File: allmydata/web/welcome.xhtml

Modify the template to show shared remaining/total quota information.

Tests

File: allmydata/test/test_client.py

Add tests to verify that new uploads are properly accounted in the uploads database (or that they lie under the rootcap dir)

File: allmydata/test/test_storage.py

Add tests to verify that new uploads are properly accounted in the uploads database (or that they lie under the rootcap dir)

Documentation

File: docs/architecture.rst

Include a brief description of the quota management system implementation.

File: docs/quotas.rst

Create a new file under docs describing in detail the implemented quota system.

Feature 2: Multiple helpers

Introduction

Helpers are used in Tahoe-LAFS to cope with the overhead factor imposed by erasure coding and the asymmetric bandwith of upload/download in ADSL connections. Uploading a file requires K/X (considering we use an X out of K storage scheme) more bandwith than the file size (and the corresponding download operation from the grid. Given these asymmetric bandwith requirements and upload/download channel capacities, the upload operation can be orders of magnitude slower than its corresponding download.

To help ease this problem, Helper nodes (assumed to have an uplink with greater capacity than the user's), receive the cyphertext directly from the StorageClient (i.e. files that have already been encrypted, but have not yet been segmented and erasure-coded), erasure-codes it and distributes the resulting shares to StorageServers. This way the size of data to be uploaded by the StorageClient is limited to the size of the file to be uploaded, with the overhead being handled by the Helper.

As of version 1.10, i2p-Tahoe-LAFS can only be configured to use a single helper server, which (if used) must be specified in tahoe.cfg. Allowing the StorageClient to choose among a list of available helpers will add flexibility to the network and allow the StorageClient to choose the least-loaded Helper at a given moment.


Description of proposed feature

Instead of the single value now stored in tahoe.cfg, we need a list of Helpers and the possibility to select one of them from that list using a particular selection algorithm.

  1. Allow for a variable number of helpers, statically contained in “BASEDIR/helpers.”

  2. Before sending a file to the helper

    1. Check all the helpers to retrieve their statistics.

    2. Choose the helper with best stats.

  3. Send the cyphertext to the chosen Helper


Existing code analysis

When a new client is started, it recovers the helper.furl from section [client] in tahoe.cfg. Its value is then used to initialize the Uploader service, as seen below:

File: allmydata/client.py


class Client(node.Node, pollmixin.PollMixin):


(…)


def init_client(self):

helper_furl = self.get_config("client", "helper.furl", None)

if helper_furl in ("None", ""):

helper_furl = None


DEP = self.encoding_params

DEP["k"] = int(self.get_config("client", "shares.needed", DEP["k"]))

DEP["n"] = int(self.get_config("client", "shares.total", DEP["n"]))

DEP["happy"] = int(self.get_config("client", "shares.happy", DEP["happy"]))


self.init_client_storage_broker()

self.history = History(self.stats_provider)

self.terminator = Terminator()

self.terminator.setServiceParent(self)

self.add_service(Uploader(helper_furl, self.stats_provider,

self.history))




In the Uploader class, we find the code to initialize the helper connection and handle when the server's connection is set or lost and recover helper information:

File: allmydata/immutable/upload.py


class Uploader(service.MultiService, log.PrefixingLogMixin):

(...)

def __init__(self, helper_furl=None, stats_provider=None, history=None):

self._helper_furl = helper_furl

self.stats_provider = stats_provider

self._history = history

self._helper = None

self._all_uploads = weakref.WeakKeyDictionary() # for debugging

log.PrefixingLogMixin.__init__(self, facility="tahoe.immutable.upload")

service.MultiService.__init__(self)


def startService(self):

service.MultiService.startService(self)

if self._helper_furl:

self.parent.tub.connectTo(self._helper_furl,

self._got_helper)


def _got_helper(self, helper):

self.log("got helper connection, getting versions")

default = { "http://allmydata.org/tahoe/protocols/helper/v1" :

{ },

"application-version": "unknown: no get_version()",

}

d = add_version_to_remote_reference(helper, default)

d.addCallback(self._got_versioned_helper)


def _got_versioned_helper(self, helper):

needed = "http://allmydata.org/tahoe/protocols/helper/v1"

if needed not in helper.version:

raise InsufficientVersionError(needed, helper.version)

self._helper = helper


def _lost_helper(self):

self._helper = None


def get_helper_info(self):

# return a tuple of (helper_furl_or_None, connected_bool)

return (self._helper_furl, bool(self._helper))



Finally on the upload function, if the Helper connection is available, it is used, and the node's storage broker when not:

File: allmydata/immutable/upload.py


class Uploader(service.MultiService, log.PrefixingLogMixin):


(...)


def upload(self, uploadable):

"""

Returns a Deferred that will fire with the UploadResults instance.

"""

assert self.parent

assert self.running


uploadable = IUploadable(uploadable)

d = uploadable.get_size()

def _got_size(size):

default_params = self.parent.get_encoding_parameters()

precondition(isinstance(default_params, dict), default_params)

precondition("max_segment_size" in default_params, default_params)

uploadable.set_default_encoding_parameters(default_params)


if self.stats_provider:

self.stats_provider.count('uploader.files_uploaded', 1)

self.stats_provider.count('uploader.bytes_uploaded', size)


if size <= self.URI_LIT_SIZE_THRESHOLD:

uploader = LiteralUploader()

return uploader.start(uploadable)

else:

eu = EncryptAnUploadable(uploadable, self._parentmsgid)

d2 = defer.succeed(None)

storage_broker = self.parent.get_storage_broker()

if self._helper:

uploader = AssistedUploader(self._helper, storage_broker)

d2.addCallback(lambda x: eu.get_storage_index())

d2.addCallback(lambda si: uploader.start(eu, si))

else:

storage_broker = self.parent.get_storage_broker()

secret_holder = self.parent._secret_holder

uploader = CHKUploader(storage_broker, secret_holder)

d2.addCallback(lambda x: uploader.start(eu))


self._all_uploads[uploader] = None

if self._history:

self._history.add_upload(uploader.get_upload_status())

def turn_verifycap_into_read_cap(uploadresults):

# Generate the uri from the verifycap plus the key.

d3 = uploadable.get_encryption_key()

def put_readcap_into_results(key):

v = uri.from_string(uploadresults.get_verifycapstr())

r = uri.CHKFileURI(key, v.uri_extension_hash, v.needed_shares, v.total_shares, v.size)

uploadresults.set_uri(r.to_string())

return uploadresults

d3.addCallback(put_readcap_into_results)

return d3

d2.addCallback(turn_verifycap_into_read_cap)

return d2

d.addCallback(_got_size)

def _done(res):

uploadable.close()

return res

d.addBoth(_done)

return d




Rendering related to the uploader is made at the web interface:

File: allmydata/web/root.py


class Root(rend.Page):


def data_helper_furl_prefix(self, ctx, data):

try:

uploader = self.client.getServiceNamed("uploader")

except KeyError:

return None

furl, connected = uploader.get_helper_info()

if not furl:

return None

# trim off the secret swissnum

(prefix, _, swissnum) = furl.rpartition("/")

return "%s/[censored]" % (prefix,)


def data_helper_description(self, ctx, data):

if self.data_connected_to_helper(ctx, data) == "no":

return "Helper not connected"

return "Helper"


def data_connected_to_helper(self, ctx, data):

try:

uploader = self.client.getServiceNamed("uploader")

except KeyError:

return "no" # we don't even have an Uploader

furl, connected = uploader.get_helper_info()


if furl is None:

return "not-configured"

if connected:

return "yes"

return "no"



These functions are accesed from the template welcome page which gets rendered by nevow:

File: allmydata/web/welcome.xhtml


(…)


<div>

<h3>

<div><n:attr name="class">status-indicator connected-<n:invisible n:render="string" n:data="connected_to_helper" /></n:attr></div>

<div n:render="string" n:data="helper_description" />

</h3>

<div class="furl" n:render="string" n:data="helper_furl_prefix" />

</div>


(…)



Tests are implemented in allmydata/test/test_helper.py

File: allmydata/test/test_helper.py

class AssistedUpload(unittest.TestCase):

(...)

def setUpHelper(self, basedir, helper_class=Helper_fake_upload):

fileutil.make_dirs(basedir)

self.helper = h = helper_class(basedir,

self.s.storage_broker,

self.s.secret_holder,

None, None)

self.helper_furl = self.tub.registerReference(h)




def test_one(self):

self.basedir = "helper/AssistedUpload/test_one"

self.setUpHelper(self.basedir)

u = upload.Uploader(self.helper_furl)

u.setServiceParent(self.s)


d = wait_a_few_turns()


def _ready(res):

assert u._helper


return upload_data(u, DATA, convergence="some convergence string")

d.addCallback(_ready)

(…)


def test_previous_upload_failed(self):


(...)

f = open(encfile, "wb")

f.write(encryptor.process(DATA))

f.close()


u = upload.Uploader(self.helper_furl)

u.setServiceParent(self.s)


d = wait_a_few_turns()


def _ready(res):

assert u._helper

return upload_data(u, DATA, convergence="test convergence string")

d.addCallback(_ready)



(…)


def test_already_uploaded(self):

self.basedir = "helper/AssistedUpload/test_already_uploaded"

self.setUpHelper(self.basedir, helper_class=Helper_already_uploaded)

u = upload.Uploader(self.helper_furl)

u.setServiceParent(self.s)


d = wait_a_few_turns()





Proposed modifications:

Client

File: allmydata/client.py

Add MULTI_HELPERS_CFG var with the path to helpers file

Create a init_upload_helpers_list to parse the file and return the list of furls (also must take into account helper.furl in tahoe.cfg for compatibility options).

Update init_client.py to call init_upload_helpers_list. Refactor code to read and write from the multiple introducers list to get a generic 'list of furls' manager that can be shared by multiple introducers and the multiple helpers initialization code. This refactoring will also be useful for feature number 3, spreading servers, given that both lists will be updated with a similar mechanism.

Eventually rename init_helper to init_helper_server.

File: allmydata/immutable/upload.py

Refactor Uploader:

Create a wrapper class to handle connections with remote helper servers using functions _got_helper, _got_versioned_helper, _lost_helper, get_helper_info from Upload class.

Create a list of available helpers from the helpers list passed during initialization.

Create a hook function to select which server to use for uploading:

Choose the best helper server to upload based on the availability of helper servers and their statistics.

Fallback to standard broker if no helper is available.

Gui

File: allmydata/web/root.py / allmydata/web/welcome.xhtml

Modify functions Root.data_helper_furl_prefix, Root.data_helper_description and Root.data_connected_to_helper and the nevow template to accommodate to a list of helpers instead of a single helper available. (See patch for Tahoe-LAFS issue #1010 for those two files)



Tests

File: allmydata/test/test_helper.py

Add several fake fake uploaders to the file, verify that the selection works fine according to (fake) server statistics.

New file: allmydata/test/test_multi_helpers.py

New test file to check that the client parses properly the list of multiple helpers and that the Uploader is also properly initialized. (see allmydata/test/test_multi_introducers.py for reference).

Documentation

  • Described the changes implemented in the following files:

    • docs/architecture.rst.

    • docs/configuration.rst.

    • docs/helper.rst.

Implementation notes

Patches for similar functionality have already been published into Tahoe-LAFS repository. They can be used as a guide for implementation details:

  1. Support of multiple introducers: provides a sample of how to move from a single introducer to a list of introducers6 7.

  2. Hook in server selection when choosing a remote StorageServer: sample of how we can implement a programmable hook to choose the target server in a generic way8.

Feature 3: Spreading servers (introducers, helpers)

Description

Version 1.10 of Tahoe-LAFS allows to specify a list of multiple introducers. However, this list is static, specified per installation in the BASEDIR/introducers file (thanks to the multiintroducers-path used in i2p-Tahoe-LAFS), given that the introducer only publishes a list of available StorageServers and not of available Introducers. This also applies for the list of Helpers once the multi-helpers modification be implemented.


Proposed feature consists of:

a) publishing a list of known Introducers that will be used to update the StorageClient's list of introducers.

b) publish a list of known helpers that will be used to update the StorageClient's list of helpers.


Configuration in tahoe.cfg will be used to indicate that:

  • In StorageClients:

    • If we want or not to get the list of Introducers to be updated automatically.

    • If we want or not to get the list of Helpers to be updated automatically.

  • In Helper nodes:

    • If we want the furl of the Helper node to be published via the introducer.

  • In Introducer nodes:

    • If we want the list of alternative introducers at BASEDIR/introducersfurl to be published.


Specification

We will use existing Introducer infrastructure to publish the furls of Helpers and Introducers.

Required functionality:

  1. A StorageClient can subscribe to notifications of 'introducer' and 'helper' services, in addition to the 'storage' service to which it subscribes now.

  2. The StorageClient will update the BASEDIR/helpers or BASEDIR/introducers file according to the data received from the Introducer.

  3. A Helper can publish its furl via an Introducer, which will distribute it to connected StorageClients.

  4. An Introducer can publish a list of alternative Introducers to the StorageClients that are connected to it. The list distributed is that stored in the BASEDIR/introducers file.

Existing code analysis

We analyse functionality related to the modifications listed above:

  1. The initialization of the introducers list from the configuration file

  2. The connection of the StorageClient to the IntroducerServer (using its IntroducerClient), and how it publishes its furl and subscribes to receive the furls of other StorageServers.

  3. The initialization of a Helper server.

  4. The initialization of an Introducer server.

Below we can find the code that initializes the list of introducers in the allmydata/client.py:



File: allmydata/client.py


class Client(node.Node, pollmixin.PollMixin):


(…)


def __init__(self, basedir="."):

node.Node.__init__(self, basedir)

self.started_timestamp = time.time()

self.logSource="Client"

self.encoding_params = self.DEFAULT_ENCODING_PARAMETERS.copy()

self.init_introducer_clients()

self.init_stats_provider()

self.init_secrets()

self.init_node_key()

self.init_storage()



(…)


def init_introducer_clients(self):

self.introducer_furls = []

self.warn_flag = False

# Try to load ""BASEDIR/introducers" cfg file

cfg = os.path.join(self.basedir, MULTI_INTRODUCERS_CFG)

if os.path.exists(cfg):

f = open(cfg, 'r')

for introducer_furl in f.read().split('\n'):

introducers_furl = introducer_furl.strip()

if introducers_furl.startswith('#') or not introducers_furl:

continue

self.introducer_furls.append(introducer_furl)

f.close()

furl_count = len(self.introducer_furls)

#print "@icfg: furls: %d" %furl_count


# read furl from tahoe.cfg

ifurl = self.get_config("client", "introducer.furl", None)

if ifurl and ifurl not in self.introducer_furls:

self.introducer_furls.append(ifurl)

f = open(cfg, 'a')

f.write(ifurl)

f.write('\n')

f.close()

if furl_count > 1:

self.warn_flag = True

self.log("introducers config file modified.")

print "Warning! introducers config file modified."


# create a pool of introducer_clients

self.introducer_clients = []



The first block highlighted in init_introducer_clients tries to read the BASEDIR/introducers file, the second adds helper.furl from tahoe.cfg if it was not contained in BASEDIR/introducers.


The second functionality that we are interested in using is the existing introducer infrastructure to update the lists of Introducers and Helpers. Below we find the relevant code used to subscribe the StorageFarmBroker (responsible of keeping in touch with the StorageServers in the grid) to the Introducer's 'storage' announcements (as an example of how we will have to publish the corresponding “helper” and “introducer” announcements):


File: allmydata/storage_client.py


class StorageFarmBroker:

implements(IStorageBroker)

"""I live on the client, and know about storage servers. For each server

that is participating in a grid, I either maintain a connection to it or

remember enough information to establish a connection to it on demand.

I'm also responsible for subscribing to the IntroducerClient to find out

about new servers as they are announced by the Introducer.

"""

(...)


def use_introducer(self, introducer_client):

self.introducer_client = ic = introducer_client

ic.subscribe_to("storage", self._got_announcement)


def _got_announcement(self, key_s, ann):

if key_s is not None:

precondition(isinstance(key_s, str), key_s)

precondition(key_s.startswith("v0-"), key_s)

assert ann["service-name"] == "storage"

s = NativeStorageServer(key_s, ann)

serverid = s.get_serverid()

old = self.servers.get(serverid)

if old:

if old.get_announcement() == ann:

return # duplicate

# replacement

del self.servers[serverid]

old.stop_connecting()

# now we forget about them and start using the new one

self.servers[serverid] = s

s.start_connecting(self.tub, self._trigger_connections)

# the descriptor will manage their own Reconnector, and each time we

# need servers, we'll ask them if they're connected or not.


def _trigger_connections(self):

# when one connection is established, reset the timers on all others,

# to trigger a reconnection attempt in one second. This is intended

# to accelerate server connections when we've been offline for a

# while. The goal is to avoid hanging out for a long time with

# connections to only a subset of the servers, which would increase

# the chances that we'll put shares in weird places (and not update

(...)


Function StorageFarmBroker.use_introducer subscribes to the 'storage' announcements with callback StorageFarmBroker._got_announcement, which tries to establish a connection with the new server whenever it receives the announcement.


During the StorageServer initialization, the announcement that this server is active is published when the connection with the introducer is ready (with the call to ic.publish):


File: allmydata/client.py


class Client(node.Node, pollmixin.PollMixin):

implements(IStatsProducer)


(…)



def init_storage(self):

# should we run a storage server (and publish it for others to use)?

if not self.get_config("storage", "enabled", True, boolean=True):

return

readonly = self.get_config("storage", "readonly", False, boolean=True)


storedir = os.path.join(self.basedir, self.STOREDIR)


(..)

ss = StorageServer(storedir, self.nodeid,

reserved_space=reserved,

discard_storage=discard,

readonly_storage=readonly,

stats_provider=self.stats_provider,

expiration_enabled=expire,

expiration_mode=mode,

expiration_override_lease_duration=o_l_d,

expiration_cutoff_date=cutoff_date,

expiration_sharetypes=expiration_sharetypes)

self.add_service(ss)


d = self.when_tub_ready()

# we can't do registerReference until the Tub is ready

def _publish(res):

furl_file = os.path.join(self.basedir, "private", "storage.furl").encode(get_filesystem_encoding())

furl = self.tub.registerReference(ss, furlFile=furl_file)

ann = {"anonymous-storage-FURL": furl,

"permutation-seed-base32": self._init_permutation_seed(ss),

}


current_seqnum, current_nonce = self._sequencer()


for ic in self.introducer_clients:

ic.publish("storage", ann, current_seqnum, current_nonce, self._node_key)


d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="aLGBKw")


To publish the address of a Helper node, we will have to do it after its creation and registration in Client.init_helper (which is the function that initializes the Helper server):


File: allmydata/client.py


class Client(node.Node, pollmixin.PollMixin):

implements(IStatsProducer)


(…)


def init_helper(self):

d = self.when_tub_ready()

def _publish(self):

self.helper = Helper(os.path.join(self.basedir, "helper"),

self.storage_broker, self._secret_holder,

self.stats_provider, self.history)

# TODO: this is confusing. BASEDIR/private/helper.furl is created

# by the helper. BASEDIR/helper.furl is consumed by the client

# who wants to use the helper. I like having the filename be the

# same, since that makes 'cp' work smoothly, but the difference

# between config inputs and generated outputs is hard to see.

helper_furlfile = os.path.join(self.basedir,

"private", "helper.furl").encode(get_filesystem_encoding())

self.tub.registerReference(self.helper, furlFile=helper_furlfile)

d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="K0mW5w")


A parameter in the config file for the helper server will tell wether or not to we should publish the helper's address via the introducer.


Regarding the publication of the updated list of Introducers, an IntroducerServer is not connected to another Introducer; however, it can publish a list of introducers which is initially preloaded at BASEDIR/introducers (same file that would be used by a standard node). We will only have to the code for initialization of the Introducer at allmydata/introducer/server.py, parse the introducers file and publish their announcements with a call to IntroducerNode.publish. (Notice that highlighted _publish function means 'publish this furl to the corresponding tub', i.e. make this furl accesible from the outside. From there we have to issue a call to IntroducerService to publish corresponding information. We may have to connect to every introducer on the list to verify they are on and recover additional information about them.



File: allmydata/introducer/server.py


class IntroducerNode(node.Node):

PORTNUMFILE = "introducer.port"

NODETYPE = "introducer"

GENERATED_FILES = ['introducer.furl']


def __init__(self, basedir="."):

node.Node.__init__(self, basedir)

self.read_config()

self.init_introducer()

webport = self.get_config("node", "web.port", None)

if webport:

self.init_web(webport) # strports string


def init_introducer(self):

introducerservice = IntroducerService(self.basedir)

self.add_service(introducerservice)


old_public_fn = os.path.join(self.basedir, "introducer.furl").encode(get_filesystem_encoding())

private_fn = os.path.join(self.basedir, "private", "introducer.furl").encode(get_filesystem_encoding())


(…)

d = self.when_tub_ready()

def _publish(res):

furl = self.tub.registerReference(introducerservice,

furlFile=private_fn)

self.log(" introducer is at %s" % furl, umid="qF2L9A")

self.introducer_url = furl # for tests

d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="UaNs9A")


(…)


class IntroducerService(service.MultiService, Referenceable):

implements(RIIntroducerPublisherAndSubscriberService_v2)


(…)


def publish(self, ann_t, canary, lp):

try:

self._publish(ann_t, canary, lp)

except:

log.err(format="Introducer.remote_publish failed on %(ann)s",

ann=ann_t,

level=log.UNUSUAL, parent=lp, umid="620rWA")

raise


(…)





Proposed modifications:

Client

File: allmydata/client.py

  • StorageClient:

    • Subscribe to the Introducer's 'helper' and 'introducer' announcements, possibly within a new Client.init_subscriptions function.

    • Create the callback function to handle each of both suscriptions and update BASEDIR/helpers and BASEDIR/inroducers accordingly.

  • HelperServer

    • After initialization of the server on Client.init_helper, publish the corresponding furl in the introducer with a 'helper' announcement

IntroducerServer

File: allmydata/introducer/server.py

  • IntroducerServer

    • During initialization, read the list of alternative Introducers from BASEDIR/inroducers.

    • Once the IntroducerService is active, publish the furl of every alternative introducer known to this Introducer instance.



Gui

No modifications are needed in the Gui.



Tests

File: allmydata/test/test_introducer.py

  • Class Client: add test cases to verify:

    • That the client processes properly the new 'helper' and 'introducer' announcements.

    • That the client updates BASEDIR/helpers and BASEDIR/introducers properly.

    • That the introducer publishes the alternative list of Introducers according to configuration in tahoe.cfg.

    • That when a client is configured as HelperServer it publishes its furl via the introducer according to configuration in tahoe.cfg.



Documentation

  • Described the changes implemented in the following files:

    • docs/architecture.rst: add reference to automatic update of DIRBASE/introducers and DIRBASE/helpers

    • docs/configuration.rst: describe new options for StorageClients (auto_update_introducers, auto_update_helpers), for HelperServer (publish_helper_furl) and for IntroducerServer (publish_alternative_introducers)

    • docs/helper.rst: describe new configuration options.








CrashPlan & Symform (FileSystem) I2P + Tahoe-LAFS
Distributed decentralized data X
Encrypted before transmitting X
No file size limits X
Manage password & encryption keys  
Pause backups on low battery  
Pause backups over selected network interfaces  
Pause backups over selected wi-fi networks  
Sync on a inactivity period – configurable bash scripting
Do not produce bandwidth bottlenecks  
Connection through Proxy  
Not enumerating IP X
Resilence X
Storage Balancing X
Sumarized volume  
Anonymous X
Sybil Attack protection  
User Disk Quota  

Social Networks

512ram 50MB HDD /PHP


Selected social network it's friendica. It's a federated service.


Main highlight feature it's possibility to import and export data, posts and likes from other social networks, such as Facebook, Diaspora, Twitter, StatusNet, pump.io, weblogs and RSS feeds - and even email.


It provides a unique centralized point of interacion with each your different profiles on social networks.



Importer Content Filter


When connectors are importing data from other social network, it's possible to configure what data it's imported and what not.


It's a import based on image content, post content and who is posting this information


For example, you can configure to not import «Cat's photos», or «military texts»


It's well known, too, that Content Filters can give false positives and false negatives. We assume will be even better in each release.



Content Indexer


Friendica network index posts, images, tags, and users.

On main gui you have a search for friends, and a search for content. Both can be integrated, but user can choose which search is performing.


This index never leaves your computer. Diferent users have diferent Content Index Database.



GUI

Friendica GUI it's redesigned again to be a 2015 design, fully responsive, and HTML5+CSS3

And a content search box it's embedded.



Friendica Bug Fixing

As all softwares, Friendica has some error sreported on bugtracker to be solved, too. All this list it's included on development Friendica Appendix 3.3


TODO: tor + certificates + federation


Specification

Facebook Diaspora Friendica
OpenID Login    
 
Search for people x x
search for places    
search for things x x
update status x x
add photos x x
add video x x
add friends x x
add links x x
add advertisements    
 
send messages x x
multi conversations x  
video conversations    
mute conversation    
change a name of multi conversation    
be online/offline    
see if sb uses facebook on phone or computer    
block chat for specific groups/people    
turn off chat for specific groups/people    
turn on chat only for specific groups/people    
use chat    
use of emoticons    
use stickers  
send links/photos/videos in conversation x x
word-searcher in full conversation    
archivate conversation    
delate message/conversation    
report as spam or abusement    
mark as read/unread    
massager shows the hour of sending the message x  
 
create pages   x
create poll x  
create ads    
like things x x
comment things x x
share things x x
pokes    
edit posts x x
edit status x x
watch activities x x
news feed x  
play games    
 
create events   x
edit profile of the event   x
option: participate/maybe/decline in events    
shows weather forecast for the day of the event    
invite your friends to event   x
remove yourself from guest list    
export event   x
 
create groups   x
manage your group    
pin posts    
private/ open/ closed group    
join groups    
leave groups    
stop notification   x
add photos    
add members    
add files    
add events    
ask questions    
change administrator    
report group    
 
follow/unfollow friends x  
follow/unfollow posts x  
 
tagg people in photos   x
tagg people in posts/ status/    
add description for picture   x
 
edit/add profile picture x x
add/change cover photo    
update personal information: x x
· Work and Education    
· Relationship    
· Family    
· Places Lived x  
· Basic Information x  
· Contact Information    
· Life Events    
· Interests x x
manage sections    
create albums    
 
add friends x x
unfriend x x
suggest friends to other person   x
“divine friends into groups
f.e. close friends, acquaintances”
x x
activity log x x
 
change general account settings x x
edit security settings   x
extra protection for people under 18    
privacy settings concerning added stuff by you x x
restrictions about who can contact you   x
restrictions about looking up    
blocking apps/ games/ advertisements/ events/ users only users only users
“possibility of choosing the way of getting notifications
(e-mail, messages, on facebook)“
E-mail only E-mail only
decide who can follow you    
payment settings    
 
aplication for mobile phone x x
help service x x
report problems x x
users can translate network to other languages    
translations are aproved by users in vote    
 
message sending with pressing enter    
can connect with other networks x x

Development Timeline

TIMELINE - All sorted By Priority Development Hours 161,25h Cost 6450€

First Month(all TODO) 161,25h 6450€
PHP Fatal error accessing profile pages with a lot of posts 1,25h 50€
Navigating to index page with HTTPS forced does not redirect to HTTPS. 3,75h 150€
poller.php error 1,25h 50€
Impossible to make an introduction 2,5h 100€
button breaks the theme 1,25h 50€
private message is not visible 1,25h 50€
Same id for original status and retweeted status. 2,5h 100€
Spaces are Being Removed from Photo URLs 1,25h 50€
Do prevent stream from jumping around when new posts arrive 1,25h 50€
Browser UserAgentString for WebOS missing 3,75h 150€
Infinite duplicate posts in Facebook 1,25h 50€
posts to other people's walls can't be edited 2,5h 100€
openid failure with a server that has multiple openid-s 1,25h 50€
Feature Request: A Home-Button 1,25h 50€
Feature Request: PGP Clearsigning Beautification 3,75h 150€
Scheduled Posts 5h 200€
Image upload in comments impossible 2,5h 100€
Improve emoticons 2,5h 100€
Posting a new comment shows a (1) counter at the home menu item. 2,5h 100€
EveryAuth Login Integration (www.everyauth.com) 2,5h 100€
XMPP/Jabber integration (www.conversejs.org) 1,25h 50€
- option: participate/maybe/decline in events 2,5h 100€
- remove yourself from event guest list 1,25h 50€
follow/unfollow friends 1,25h 50€
follow/unfollow posts 1,25h 50€
tag people in posts/ status/ 2,5h 100€
add/change cover photo 1,25h 50€
Add Profile Information 3,75h 150€
· Work and Education    
· Relationship    
· Family    
· Places Lived    
· Basic Information    
· Contact Information    
· Life Events    
create photo albums 2,5h 100€
extra protection for people under 18 2,5h 100€
Allow if you want to be searched 1,25h 50€
“possibility of choosing the way of getting notifications (e-mail, messages, on facebook)“Now it's email only 2,5h 100€
decide who can follow you 2,5h 100€
Friendly UI redesign:Wireframe redesign & layout front end 40h 1600€
Friendly UI redesign:Development front end CSS3 40h 1600€
Friendly UI redesign:Develop connections on front end 10h 400€

Search Engine

300mb RAM 25gb HD /Java


The best search engine it's YaCy. It's developed in Java, with a distributed database, shared when users are doing a petition.

In this way not each computer has to have all internet crawled content.


User can access YaCy search engine from a regular URL, or integrated in OwnCloud



Webcrawler

YaCy it's modified to be able to index in same time: intranet, extranet, DarkNets (I2P/TOR) and internal plugins.


File indexing some will be shared with other YaCy nodes and other not.

- Extranet (regular internet) results are shared with all YaCy nodes

- Intranet results are shared with other YaCy nodes on the same intranet.

- DarkNet results are shared with other DarkNet YaCy installations.

- Internal plugins aren't shared with nobody.



Search page

When user it's on YaCy search page, he can select which result want to get. By default all are checked: internal, external, darknets and internal plugins.



WebCrawler Internal Plugin: OwnCloud

This YaCy Plugin it's used to wrap OwnCloud already made indexation (read OwnCloud File indexation section).

YaCy can show OwnCloud files and its content, redirecting to OwnCloud installation.



WebCrawler Internal Plugin: Friendica

This YaCy Plugin it's used to wrap Friendica already made indexation (read Friendica Content indexation section).

YaCy can show Friendica people, posts and images, redirecting to Friendica local installation.



WebCrawler Internal Plugin: Emails

This YaCy Plugin it's used to index emails on system.

YaCy index mail title, sender, recipients and attachments if they aren't encrypted. In this case, will be indexed what it can.

If email it's GPG encrypted, also it will index what it can.



Search Improvement

YaCy's search results needs to be improved, to be fully competitive in a Google environment.



OwnCloud Integration

On OwnCloud main gui, there's a YaCy search box, like Google it's integrated with gmail.


It uses JSON query URL to get direct YaCy results and show it inside OwnCloud.






Queries YaCy JSON API URL it's (http://localhost:8090/yacysearch.json?query=microsoft )


Then show it on the frontend.




YaCy Bug Fixing

As all softwares, YaCy has some error sreported on bugtracker to be solved, too. All this list it's included on development YaCy Appendix 3.4



Google & Bing & Yahoo YaCy
Competitive Search Results (if you search a word as “kademar”, it should appear in first place website www.kademar.org) & (appear in first place important links inside this first website) & (improve website search result order by website relevancy) & (improve search results by search sentence)  
Search fory:
- text X
- images X
- videos  
- shopping  
- maps  
- news  
- books  
- flights  
- apps  
- celebrity  
The ability to control keys X
Related search (in the bottom) X (keywords in sidebar, sentences in results)
“Language autodetection based on browser language or address, for example when I write google.co.uk, displays the inscriptions ”“This site is avaiable in English”““  
Change language by selecting your localisation in settings Deustch, you can change this in preferences. Five languages are available
Case unsensitive  
Search pages from the world or just the selected language (browser / chosen language)  
Search operators:
- Search in title (intitle:) X
- Search in url (inurl:) X
- Search info (info:)  
- Search cache (cache:)  
- Search for a number range (eg. camera $50..$100)  
- Search for either word (eg. world cup location 2014 OR 2018)  
”- Fill in the blank (eg. ”“a * saved is a * earned”“)“  
- Search for pages that are similar to a URL (eg. related:time.com)  
- Search for pages that link to a URL (eg. link:google.com)  
- Search within a site or domain (eg. olympics site:nbc.com ) Search images:
”- Exclude a word (eg. ”“jaguar speed -car”“ or ”“pandas -site:wikipedia.org”“)“  
”- Search for an exact word or phrase (eg. ”“imagine all the people”“)“  
  - inlink:
  - author:
  - tld:
  - /ftp
  - /http
  - /date
  - /near
  - /smb
  - /file
Stop-list, that is words are not taken into account (a, the, on, at, in, and, of, punctation)  
Apart from standard files html/htm, php/php3, xhtml, asp and indexes other types like: txt, ans, pdf, ps, doc, xls, ppt, wks, wps, wdb, wri, rtf, swf, wk1, wk2, wk3, wk4, wk5, wki, wks, wku, lwp, mw  
”- search for filetypes (eg. ”“filetype:odt”“)“  
Spelling dictionary ?
Extras:
- calculator  
- unit converter  
- currency converter  
- definitions  
- map  
Search by voice  
Search tools for text: Search tools for text:
- by language  
- by country  
- by date  
- by search near  
Onscreen keyboard  
  - by type of file
  - by type of server
  - by url
Safe Search filter  
Dynamic search (dynamic display of results when typing) X
Localization by geolocalization (IP)  
“Graf knowledge: When you type ”“Torun”” displays information about the city, sometimes weather (it seems to me that resigned from this option).”  
In search results for the most important results are displayed:
- link to page with results for pictures  
- link to page with results for videos  
- link to page with results for news  
- releated (eg. People)  
- for sportsmen changing background (mundial)  
Search engine designed for mobile devices  
“Notification ”“Looking for results in English?”“ (English for example)“  
“Enter an expression incorrectly, it shall be found in the correct form (in addition displays the words ”“Displays results for: sommething””, ”“Instead, search for: something”“)“  
The ability to remove information from google (but very hard to do)  
Search images:
- by color  
- by size  
- by type  
- ad management system (google adsense / adwords)  
- by usage rights  
- by images  
- by person  
- by structure  
- other: Top gallery  
  - by type of file
  - by type of server
  - by url
Search history  
Autocomplete  
Personalize Search Screen  
Webcrawler for internet y external  

Development Plan

TIMELINE - All sorted By Priority Development Hours 626h Cost* 15€/h based on a 2400€ salary 23.475€

First Month(bug month) 272h. 10.200 €
* Performance Issues: http://mantis.tokeek.de/view.php?id=305 32h. 1200€
* “Too many open files” while searching&crawling: http://mantis.tokeek.de/view.php?id=406 16h. 600€
* Unable to list Process Scheduler: http://mantis.tokeek.de/view.php?id=290 8h. 300€
* Yacy does not start: http://mantis.tokeek.de/view.php?id=420 24h. 900€
* index 100% CPU: http://mantis.tokeek.de/view.php?id=81 24h. 900€
* improve YaCy Web UI: http://mantis.tokeek.de/view.php?id=151 16h. 600€
* CPU cicles: http://mantis.tokeek.de/view.php?id=418 24h. 900€
* Huge Ram Eater: http://mantis.tokeek.de/view.php?id=282 32h. 1200€
* Young mode and DHT issue: http://mantis.tokeek.de/view.php?id=150 24h. 900€
* SSL Init Fail: http://mantis.tokeek.de/view.php?id=251 16h. 600€
* Infinite crash after one “not enough free space”: http://mantis.tokeek.de/view.php?id=144 24h. 900€
* YaCy cant boot anymore after setting up SSL: http://mantis.tokeek.de/view.php?id=323 8h. 300€
* Improve search algorithm: http://mantis.tokeek.de/view.php?id=283 16h. 600€
* Search engine designed for mobile devices (responsive) 18h. 675€

Second Month 246h 9.225 €
* out of memory on big index: http://mantis.tokeek.de/view.php?id=376 32h. 1200€
* Search pages from the world or just the selected language (browser / chosen language) 8h. 300€
* Apart from standard files html/htm, php/php3, xhtml, asp and indexes other types like: txt, ans, pdf, ps, doc, xls, ppt, wks, wps, wdb, wri, rtf, swf, wk1, wk2, wk3, wk4, wk5, wki, wks, wku, lwp, mw 24h. 900€
* Incrase search frequency: http://mantis.tokeek.de/view.php?id=419 16h. 600€
* Stop-list, that is words are not taken into account (a, the, on, at, in, and, of, punctation) 4h. 150€
* Bandwidth limitator: http://mantis.tokeek.de/view.php?id=165 24h. 900€
* Network Autoclean old entries: http://mantis.tokeek.de/view.php?id=20 16h. 600€
* Change YaCy process priority: http://mantis.tokeek.de/view.php?id=73 16h. 600€
* Case unsensitive 4h. 150€
* Search pages from the world or just the selected language (browser / chosen language) 8h. 300€
* Search for pages that link to a URL (eg. link:google.com) 8h. 300€
* Search within a site or domain (eg. olympics site:nbc.com ) 8h. 300€
* Search for an exact word or phrase (eg. “imagine all the people”) 4h. 150€
* search for filetypes (eg. “filetype:odt”) 8h. 300€
* Import Open StreetMap data in YaCy: http://mantis.tokeek.de/view.php?id=226 32h. 1200€
* Personalize SearchEngine Screen 18h. 675€
* Onscreen keyboard 16h. 600€

Third Month 100h 3750 €
* Graf knowledge: When you type “Torun” displays information about the city, sometimes weather). 40h. 1500€
* Competitive Search Results (if you search a word as “kademar”, it should appear in first place website www.kademar.org) & (appear in first place important links inside this first website) & (improve website search result order by website relevancy) & (improve search results by search sentence) 60h. 2250€

Collaborative Document Editing - OwnCloud


OwnCloud already has ODT (text) editing based in webodf.org tech.


When OwnCloud recibes End-2-end javascript encryption «secure mode», it's needed to change way that Collaboration suite it's making connections, to make possible online editing in a end-2-end encrypted scenario.



New Network Connection Model: 1 user


When user enter to OwnCloud to edit a document.


- Browser loads document editor JS

- Loads file document. And OwnCloud keeps information that he is editing this document (master connection).

- Master connection it's only who can save document to owncloud.

- It detects file it's encrypted. Ask for document password, to decrypt it,

- Document editing suite and document it's loaded on user's RAM. When document it's saved, it's encrypted using RAM's memory password, and send it again to server.

- On close, OwnCloud removes information that somebody it's editing this document.



New Network Connection Model: multiple users – Realtime P2P


User 2 can enter to edit same file by using a direct link or using same OwnCloud GUI. (slave connection)


When interface it's loading, OwnCloud send a notification to user1. In this moment, User1 owncloud gui saves the document, and lock the interface, while it's entering new member.


User 2 downloads current document version, and his OwnCloud asks for decrypting password.


When user 2 it's connected user1 interface gets unlocked


Master connection and slaves connections talks Peer-to-Peer without middle nodes.


Each modification, pointer position and changes, are shared between users directly.


They see changes simultaneously, but only master connection writes changes to owncloud.


When user1 (master connection) disconnects, give to user 2 master connection mark. Then user 2 writes the document.

If user1 again or user 3 connects, master connection remains on user 2



Support more files


We contribute to WebODF to create ODS (spreadsheet) and ODP (presentation) viewers, and then editors.


We contribute with WebODF to add more features to their ODT document editor.


Then OwnCloud editing suite will have a full working office suite.


MailPile


Conferencing solution it's XMPP + OTR (encryption)+ WebRTC (video)


There's a OwnCloud plugin already developed that do this. https://apps.owncloud.com/content/show.php/JavaScript+XMPP+Chat?content=162257

It uses ejabberd xmpp server


We will compile ejabbberd with TOR support (https://spaceboyz.net/~astro/ejabberd-2.0.x+tor.patch)


We provide a set of XMPP «grey» communitycube servers. Those servers has Prosody with mod_onions, which allow them to connet through TOR or regular servers, creating bridges between 2 networks.


When you are using OwnCloud + XMPP user can choose to use our jabber grey communitycube servers, or a regular server like jabber.ccc.de


User also could connect by regular Jabber client



Encryption


XMPP communications Encryption it's handled by OTR XMPP protocol extension




Video conference will be handled by WebRTC.

WebRTC needs a STUN/TURN server to broker connections. we will create some like webrtc.communitycube.net


Proposal tech

http://www.html5rocks.com/en/tutorials/webrtc/basics/

http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/


Broker connection server we PeerJS Server (WebRTC connection broker),

PLEASE DONATE AND SHARE!

Kickstarter Campaign

Thank you for your awesome support!