dirstat/2(33) Language: English

DirStat 2

DirStat 2
Size [KiB]2460


DirStat 2 is a powerful tool to analyze directory structures. If you encounter a HDD filled with many files, folders and subfolders you might want to know which files are the largest, which file types take up the most space, which user names the files belong to or other questions which you can not answer directly using the utilities provieded by the platform especially on Windows systems.

On Linux you could write scripts for that purpose but filenames can be very troublesome in scripts and listing all that information neatly will require a complex set of scripts still inferior to DirStat 2.

DirStat 2 tries to address these questions: You can scan a directory structure (refer to Scanning) and analyze it later – even on a separate system if you prefer your known working environment. The file metadata is stored in a PostgreSQL database which allows the user to make arbitrary queries to get whichever information is required. The DirStat 2 GUI (see Evaluation) presents important information at one sight and can be extended via user-written JavaScripts to allow very specific views on the data.


This program includes and uses PostgreSQL's Java Driver and Netbeans' Outline component.

Prerequisites and Setup

Depending on your situation/location/system you will want to choose different ways of installing PostgreSQL. The most comfortable way is to install the package provided by your favorite linux distribution or use the normal Windows setup. If you are on a foreign system and do not run the database on an analysis system (probably because there is no separate system available for analysis) you might want to go for the PostgreSQL portable package http://sourceforge.net/projects/postgresqlportable/.

Depending on your installation you will need to create an username and password but regardless of how you installed PostgreSQL you will need to create the database manually as described in PostgreSQL Setup.

When run for the first time, DirStat 2 will automatically create tables and indices as necessary. To clear all tables (but not delete the database itself) the script drop.sql is provided for convenience. You can run it via

$ psql masysma_dirstat
masysma_dirstat=> \i drop.sql

You might need to enter the whole path to drop.sql. At that interactive shell you can always perform queries ad-hoc without writing your own script and without using the GUI. To find out which attributes and tables are defined, extract ma/dirstat/init.sql from your DirStat 2 JAR file.

If you want to clear all traces of DirStat 2's database from your system, use the described commands after the drop described above.

# su postgres
$ psql
DROP DATABASE masysma_dirstat;
DROP ROLE "linux-fan";

PostgreSQL Setup

Follow http://www.cyberciti.biz/faq/howto-add-postgresql-user-account/ and http://www.postgresql.org/docs/9.3/static/sql-createrole.html

# su postgres
$ psql
CREATE ROLE "linux-fan" WITH LOGIN PASSWORD 'testwort';
CREATE DATABASE masysma_dirstat;
GRANT ALL PRIVILEGES ON DATABASE masysma_dirstat TO "linux-fan";

On Windows systems, you can get the SQL Shell (psql) from the GUI menu.

Also, you might chose arbitrary usernames and passwords to suit your needs but the database name masysma_dirstat is fixed (unless you recompile DirStat 2).


If you intend to only perform one scan with standard settings you can enter

$ java -jar dirstat2.jar --scan --db=user:password@host

This will scan all devices (on Windows) and the system root on Unix or Linux. If you want to scan a specific directory you can alter the command this way:

$ java -jar dirstat2.jar --scan --db=user:password@host --src=/etc

If you want to give the scan a name (the default value is of course default), you can add a suitable parameter:

$ java -jar dirstat2.jar --scan --db=user:password@host --name=linux1

Names must be unique for one database. The DB parameter contains the PostgreSQL user, password and host (and optional port) which the server is running on.

While the scan is running, you will be presented status information, see Scan status for details. Further options for additional metadata to scan are available via

$ java -jar dirstat2.jar --help

If you want to disable SSL, you can prepend INSECURE: to your login data. This is often necessary for Windows systems.


To open the GUI for evaluation you can use the following command:

$ java -jar dirstat2.jar --eval --db=user:password@host --name=linux1

This will open the scan named linux1 in the GUI. Multiple GUI instances can view the same dataset (there is a real database behind after all) and it is possible to have another scan (with a different name) running while the GUI is open. It is not recommended to view the data you are currently scanning because it might be temporarily inconsistent (this is not a database issue but a design choice of the scanning facility to maximize scanning performace).

The GUI should look like in the following pseudo-graphic

/--[ DirStat 2 ]------------------------------------------[ _ [] X ]--\
| Path          Pattern  Files  Size  Errors  Empty  Biggest etc.     |
| linux1                                                              |
|   Extensions  *.*      4096   10M   5       10     1M               |
|     txt       *.txt    600     2M   0        2     64K              |
|   FS Tree     /        4096   10M   5       10     1M               |
|     /test     scanp    2048    5M   0        2     512K             |
|     etc.                                                            |
| General Information  | Empty files         | File Size Distribution |
| .................... | ................... | ...................... |
| . Key     . Value  . | . /test/empty.txt . | .                    . |
| .................... | . /test/emty.txt  . | .          *         . |
| . Objects . 5000   . | . /test/.empx.txt . | .   *      **        . |
| . etc.    . etc.   . | . /test/.testrc   . | .  **     ****  *    . |
| .................... | ................... | ...................... |

As you can see it is divided into four parts. The first and main part displays the so-called view which can be a file system structure or a view ordered by file extension or such. The views are displayed at the top of the frame. The three other parts are a table of general information, a list of empty files and a diagram about the file size distribution whose x-Axis is logarithmic to allow the user to view large filesizes (like 1 TiB) and small file sizes (like 4 KiB) in one diagram. The three lower parts always refer to the entry you have selected in a view in the top panel.

To make advanced use of the views, you can attach a new view below an existing one. It will not be displayed below in the tree structure but only contain contents which occur below the node they were attached to. To view the file system structure of all .txt-files you might want to select the entry txt in the Extensions-view and use the context menu to attach a new Default/tree.js view. The view will show you a subset of the scan in hierarchical tree form (like in the real filesystem) but it will only contain files and folders which are (or contain) .txt-files. Similarily, you can attach a Default/extension.js view below a node from the FS Tree to view the filetypes below a specific directory.

To get more advanced and special information about the scan, you can also write your own views in JavaScript and attach them below other views or just add them to the GUI by attaching them below the scan name (here: linux1). The API for writing your own view is documented in the section Writing your own view below.

Scan status

During the scan, DirStat 2 will print out status lines about errors and the scan progress. As there is only one scan pass, DirStat 2 does not know how far it already is – there is no real progress indicator. From the amount of data scanned (which you can compare roughly to your HDD fill level) you can get an aestimate.

A normal status line will consist of key-value associations which contain information about the number (or size) of items scanned and a change from the last status display which is marked with a +-sign.

Scan Status Symbol Explaination
S Symbol Description
f Files The number of files scanned.
d Directories The number of directories completely scanned.
Errors The number of errors occurred.
s Size Summarized size of all entities scanned.
c Commited The number of queries sent to the database.
q Queries The total number of queries queued.

On fast filesystems it is normal that the scan has already completed and the data has not yet been committed to the database. Scanning a normal HDD for the first time it is likely that the filesystem scan time outweighs the database query time.

Writing your own view

To write your own view, create a JavaScript file with the following skeleton.

function MyView() { }
MyView.prototype = new ViewJS();
var ref = MyView.prototype;

ref.root = null;

ref.create = function() {
        this.root = this.createNode(null);
        this.root.id = "Extension";
        this.root.pattern = "*.*";

ref.populateChildren = function(node) {
        // ...

ref.createFilter = function(node) {
        return this.createDefaultFilter("files.ext = ?",
                                                [ node.userdata ]);

function create_view() { return new MyView(); }

These functions have been copied from extension.js. Your own filter will of course need a real name (although MyView would also work) and an own implementation of the methods. The Methods are defined as described below.

This function is invoked when your view is added. You should do basic initialization here. If your view will not dynamically add new nodes on demand but create them all at the beginning (like the Extension view does) you should do the node creation here. At least the root node should be created here.
If your view dynamically creates the children on demand you implement this method which is invoked if the children should be created. The node parameter is either your root node or one of it's children. To mark that this node's children are now available call node.setReady().
This method is invoked if another view is to be attached below your view or the secondary panels are to be updated for the node given as parameter. default.js has a convenicence function createDefaultFilter which allows you to write part of a prepared WHERE statement and give the parameters as a JavaScript array. This function returns the created filter as you can see in the skeleton.
This (not object oriented) function is invoked to create an object of your view. It should normally not do anything beyond creating the object as it might block the GUI. Do all initailization in create.

All views have to extend ViewJS which is implemented in default.js. To get an overview over the methods available, you should extract default.js from the DirStat 2 JAR file and scan it for useful methods. For further information about implementing a view feel free to extract extension.js and tree.js from the JAR – they both contain working views and show you how to use the functions you find in default.js.

Querying the Database

Because of DirStat 2's filter concept, querying is rather difficult. Depending on wether you want to query without a filter or using the filters the user of the view might have imposed by attaching your view below another, you can use two different methods.

This method creates a query which ignores all filters. It just returns a standard Java PreparedStatement with no filters enabled. You should only use it if you know that the data you query is unique for the whole database (including possible other scans in the same database!).
This is the function you will more likely be using: It creates a query and attaches suitable filters before all WHERE clauses. If you do not have a WHERE clause yet, use WHERE ? to add a WHERE clause which only applies the filters. To populate such a PreparedStatement you need to invoke suitable setString(...) and similar methods to populate the ?s you have created except for WHERE ? and interleave them with suitable calls to appendFilterValues(query, pos) to fill the implicit filters and WHERE ? clauses. Check extension.js and tree.js for examples.


As you will write a mixture of Java and JavaScript there can be a serious trouble with String objects: Sometimes they behave like JavaScript Strings and sometimes they behave like Java String objects. If you pass anything to a filter or Query like with query.setString(pos++, value) make sure it is a Java String object and not a JavaScript String. Otherwise your application might just hang at 0 % CPU usage for no visible reason. Similar issues occur when invoking nonexisting methods on objects.

Known Issues

No error dialog
Error dialogs are a Tools 2.1 feature that has not yet been implemented. Therefore, DirStat 2 only logs errors to the console.
Single Threaded Querying
Multiple Connections should be used to perform queries in parallel.
Scanning Hangs or Terminates on Ma_Sys.ma 9 w/ Cryptovol

Dockerization / Upload on GitHub RC

Zum Seitenanfang