dirstat/2(33) Language: English

DirStat 2

----------------------------------------------------------------------[ Meta ]--

name		dirstat/2
section		33
description	DirStat 2
tags		dirstat postgresql programs java
encoding	utf8
compliance	public
lang		en
creation	2014/01/27
download	main
dlink		http://k007.kiwi6.com/hotlink/wl8bpb1vxc/dirstat2.7z
ddescr		DirStat 2
dsize		2460
dchcksm		d12dd27d776b97bc9d22afc304d6336e3d3c0d4566210b8ae2e7e8063963b1f0
copyright	Copyright (c) 2014, 2016 Ma_Sys.ma.
		For further info send an e-mail to Ma_Sys.ma@web.de

--------------------------------------------------------------[ Introduction ]--

DirStat 2 is a powerful tool to analyze directory structures. If you encounter
a HDD filled with many files, folders and subfolders you might want to know
which files are the largest, which file types take up the most space, which user
names the files belong to or other questions which you can not answer directly
using the utilities provieded by the platform especially on Windows systems.

On Linux you could write scripts for that purpose but filenames can be very
troublesome in scripts and listing all that information neatly will require
a complex set of scripts still inferior to DirStat 2.

DirStat 2 tries to address these questions: You can scan a directory structure
(refer to ``Scanning'') and analyze it later -- even on a separate system if
you prefer your known working environment. The file metadata is stored in a
PostgreSQL database which allows the user to make arbitrary queries to get
whichever information is required. The DirStat 2 GUI (see ``Evaluation'')
presents important information at one sight and can be extended via user-written
JavaScripts to allow very specific views on the data.

-------------------------------------------------------------------[ Credits ]--

This program includes and uses PostgreSQL's Java Driver and Netbeans' `Outline`

---------------------------------------------------[ Prerequisites and Setup ]--

Depending on your situation/location/system you will want to choose different
ways of installing PostgreSQL. The most comfortable way is to install the
package provided by your favorite linux distribution or use the normal Windows
setup. If you are on a foreign system and do not run the database on an analysis
system (probably because there is no separate system available for analysis) you
might want to go for the PostgreSQL portable package

Depending on your installation you will need to create an username and password
but regardless of how you installed PostgreSQL you will need to create the
database manually as described in ``PostgreSQL Setup''.

When run for the first time, DirStat 2 will automatically create tables and
indices as necessary. To clear all tables (but not delete the database itself)
the script `drop.sql` is provided for convenience. You can run it via

	$ psql masysma_dirstat
	masysma_dirstat=> \i drop.sql

You might need to enter the whole path to `drop.sql`. At that interactive shell
you can always perform queries ad-hoc without writing your own script and
without using the GUI. To find out which attributes and tables are defined,
extract `ma/dirstat/init.sql` from your DirStat 2 JAR file.

If you want to clear all traces of DirStat 2's database from your system, use
the described commands after the drop described above.

	# su postgres
	$ psql
	DROP DATABASE masysma_dirstat;
	DROP ROLE "linux-fan";

----------------------------------------------------------[ PostgreSQL Setup ]--

Follow url(http://www.cyberciti.biz/faq/howto-add-postgresql-user-account/)
and url(http://www.postgresql.org/docs/9.3/static/sql-createrole.html)

	# su postgres
	$ psql
	CREATE ROLE "linux-fan" WITH LOGIN PASSWORD 'testwort';
	CREATE DATABASE masysma_dirstat;
	GRANT ALL PRIVILEGES ON DATABASE masysma_dirstat TO "linux-fan";

On Windows systems, you can get the ``SQL Shell (psql)'' from the GUI menu.

Also, you might chose arbitrary usernames and passwords to suit your needs but
the database name `masysma_dirstat` is fixed (unless you recompile DirStat 2).

------------------------------------------------------------------[ Scanning ]--

If you intend to only perform one scan with standard settings you can enter

	$ java -jar dirstat2.jar --scan --db=user:password@host

This will scan all devices (on Windows) and the system root on Unix or Linux.
If you want to scan a specific directory you can alter the command this way:

	$ java -jar dirstat2.jar --scan --db=user:password@host --src=/etc

If you want to give the scan a name (the default value is of course `default`),
you can add a suitable parameter:

	$ java -jar dirstat2.jar --scan --db=user:password@host --name=linux1

Names must be unique for one database. The DB parameter contains the PostgreSQL
user, password and host (and optional port) which the server is running on.

While the scan is running, you will be presented status information, see
_{}_ ``Scan status'' for details. Further options for additional metadata to
scan are available via

	$ java -jar dirstat2.jar --help

If you want to disable SSL, you can prepend ``INSECURE:'' to your login data.
This is often necessary for Windows systems.

----------------------------------------------------------------[ Evaluation ]--

To open the GUI for evaluation you can use the following command:

	$ java -jar dirstat2.jar --eval --db=user:password@host --name=linux1

This will open the scan named `linux1` in the GUI. Multiple GUI instances can
view the same dataset (there is a real database behind after all) and it is
possible to have another scan (with a different name) running while the GUI is
open. It is not recommended to view the data you are currently scanning because
it might be temporarily inconsistent (this is not a database issue but a design
choice of the scanning facility to maximize scanning performace).

The GUI should look like in the following pseudo-graphic

	/--[ DirStat 2 ]------------------------------------------[ _ [] X ]--\
	| Path          Pattern  Files  Size  Errors  Empty  Biggest etc.     |
	| linux1                                                              |
	|   Extensions  *.*      4096   10M   5       10     1M               |
	|     txt       *.txt    600     2M   0        2     64K              |
	|   FS Tree     /        4096   10M   5       10     1M               |
	|     /test     scanp    2048    5M   0        2     512K             |
	|     etc.                                                            |
	| General Information  | Empty files         | File Size Distribution |
	| .................... | ................... | ...................... |
	| . Key     . Value  . | . /test/empty.txt . | .                    . |
	| .................... | . /test/emty.txt  . | .          *         . |
	| . Objects . 5000   . | . /test/.empx.txt . | .   *      **        . |
	| . etc.    . etc.   . | . /test/.testrc   . | .  **     ****  *    . |
	| .................... | ................... | ...................... |

As you can see it is divided into four parts. The first and main part displays
the so-called ``view'' which can be a file system structure or a view ordered
by file extension or such. The views are displayed at the top of the frame. The
three other parts are a table of general information, a list of empty files and
a diagram about the file size distribution whose x-Axis is logarithmic to allow
the user to view large filesizes (like 1\,TiB) and small file sizes (like
4\,KiB) in one diagram. The three lower parts always refer to the entry you have
selected in a view in the top panel.

To make advanced use of the views, you can attach a new view below an existing
one. It will not be displayed ``below'' in the tree structure but only contain
contents which occur below the node they were attached to. To view the file
system structure of all `.txt`-files you might want to select the entry ``txt''
in the ``Extensions''-view and use the context menu to attach a new
``Default/tree.js'' view. The view will show you a subset of the scan in
hierarchical tree form (like in the real filesystem) but it will only contain
files and folders which are (or contain) `.txt`-files. Similarily, you can
attach a ``Default/extension.js'' view below a node from the ``FS Tree'' to
view the filetypes below a specific directory.

To get more advanced and special information about the scan, you can also
write your own views in JavaScript and attach them below other views or just add
them to the GUI by attaching them below the scan name (here: ``linux1''). The
API for writing your own view is documented in the section ``Writing your own
view'' below.

---------------------------------------------------------------[ Scan status ]--

During the scan, DirStat 2 will print out status lines about errors and the
scan progress. As there is only one scan pass, DirStat 2 does not know how far
it already is -- there is no ``real'' progress indicator. From the amount of
data scanned (which you can compare roughly to your HDD fill level) you can get
an aestimate.

A normal status line will consist of key-value associations which contain
information about the number (or size) of items scanned and a change from the
last status display which is marked with a `+`-sign.

	  Scan Status Symbol Explaination
	  S  Symbol       Description
	  f  Files        The number of files scanned.
	  d  Directories  The number of directories completely scanned.
	  e  Errors       The number of errors occurred.
	  s  Size         Summarized size of all entities scanned.
	  c  Commited     The number of queries sent to the database.
	  q  Queries      The total number of queries ``queued''.

On fast filesystems it is normal that the scan has already completed and the
data has not yet been committed to the database. Scanning a normal HDD for the
first time it is likely that the filesystem scan time outweighs the database
query time.

-----------------------------------------------------[ Writing your own view ]--

To write your own view, create a JavaScript file with the following skeleton.


	function MyView() { }
	MyView.prototype = new ViewJS();
	var ref = MyView.prototype;
	ref.root = null;
	ref.create = function() {
		this.root = this.createNode(null);
		this.root.id = "Extension";
		this.root.pattern = "*.*";
	ref.populateChildren = function(node) {
		// ...
	ref.createFilter = function(node) {
		return this.createDefaultFilter("files.ext = ?",
							[ node.userdata ]);
	function create_view() { return new MyView(); }

These functions have been copied from `extension.js`. Your own filter will of
course need a real name (although ``MyView'' would also work) and an own
implementation of the methods. The Methods are defined as described below.

	This function is invoked when your view is added. You should do basic
	initialization here. If your view will not dynamically add new nodes
	on demand but create them all at the beginning (like the Extension view
	does) you should do the node creation here. At least the root node
	should be created here.

	If your view dynamically creates the children on demand you implement
	this method which is invoked if the children should be created. The node
	parameter is either your root node or one of it's children. To mark that
	this node's children are now available call `node.setReady()`.

	This method is invoked if another view is to be attached below your view
	or the secondary panels are to be updated for the node given as
	parameter. `default.js` has a convenicence function
	`createDefaultFilter` which allows you to write part of a prepared
	`WHERE` statement and give the parameters as a JavaScript array.
	This function returns the created filter as you can see in the skeleton.

	This (not object oriented) function is invoked to create an object of
	your view. It should normally not do anything beyond creating the
	object as it might block the GUI. Do all initailization in `create`.

All views have to extend `ViewJS` which is implemented in `default.js`. To get
an overview over the methods available, you should extract `default.js` from
the DirStat 2 JAR file and scan it for useful methods. For further information
about implementing a view feel free to extract `extension.js` and `tree.js` from
the JAR -- they both contain working views and show you how to use the functions
you find in `default.js`.

Querying the Database

Because of DirStat 2's filter concept, querying is rather difficult. Depending
on wether you want to query without a filter or using the filters the user of
the view might have imposed by attaching your view below another, you can use
two different methods.

	This method creates a query which ignores all filters. It just returns
	a standard Java `PreparedStatement` with no filters enabled. You should
	only use it if you know that the data you query is unique for the _whole
	database_ (including possible other scans in the same database!).

	This is the function you will more likely be using: It creates a query
	and attaches suitable filters before all `WHERE` clauses. If you do not
	have a `WHERE` clause yet, use `WHERE ?` to add a `WHERE` clause which
	only applies the filters. To populate such a `PreparedStatement` you
	need to invoke suitable `setString(...)` and similar methods to populate
	the `?`s you have created except for `WHERE ?` and interleave them with
	suitable calls to `appendFilterValues(query, pos)` to fill the implicit
	filters and `WHERE ?` clauses. Check `extension.js` and `tree.js` for


As you will write a mixture of Java and JavaScript there can be a serious
trouble with `String` objects: Sometimes they behave like JavaScript Strings
and sometimes they behave like Java String objects. If you pass anything to a
filter or Query like with `query.setString(pos++, value)` make sure it is a
_Java_ String object and not a JavaScript String. Otherwise your application
might just hang at 0\,% CPU usage for no visible reason. Similar issues occur
when invoking nonexisting methods on objects.

--------------------------------------------------------------[ Known Issues ]--

No error dialog
	Error dialogs are a Tools 2.1 feature that has not yet been implemented.
	Therefore, DirStat 2 only logs errors to the console.

Single Threaded Querying
	Multiple Connections should be used to perform queries in parallel.

	Scanning Hangs or Terminates on Ma_Sys.ma 9 w/ Cryptovol

Dockerization / Upload on GitHub RC

Zum Seitenanfang