0.01 WIP: Development notes about nfnetlink. (Work In Progress).
 - Jay Schulist <jschlst@samba.org>

Revisions of the document:
0.01	Jay Schulist		Initial creation of this document.

Table of Contents:
0). 		Overview of Netfilter via Netlink.
1).		Module and C file layout.
2).   		Netfilter via Neltink structure overview.
2.1). 		Netfilter via Netlink structure details.
2.1.1). 	Connection Tracking.
2.1.2). 	Iptables.
2.1.3). 	Firewall Monitor (fwmon).
2.1.4). 	User Log (ulog).
3). 		User space tools (iptables2).
4).		What is missing from this document.
Appendix A). 	Iptables2 nf help output.
Appendix B).	Changes to Iptables kernel.

0). Overview of Netfitler via Netlink.
Netfilter via Netlink aka Netfilter message support is a generic API for
direct user space access to a collection of numerous Iptables components.

Netlink provides a well defined and simple socket API to access kernel
tables. Access to Add, Delete, Modify, Get and List tables is the default
while event notification is usually included as an added benefit.

Netfilter messages or technically known as nfnetlink is the generic
interface to all Netfilter components.

1). Module and C file layout.
Nfnetlink is designed to have a base module which includes all generic
netfilter netlink hooks. Each sub-component of nfnetlink is includes
as a seperate sub-module. Each sub-module is dependent on iptable_netlink 
and ideally is not dependent on any other iptable_netlink sub-module.

iptable_netlink.[c,o]:	 	Netfilter message support.
 ip_netlink_conntrack.[c,o]: 	Connection tracking message support.
 ip_netlink_iptables.[c,o]:	Iptable message support.

linux/nfnetlink.h:		Header file for all nfnetlink commands
				and messages. nfnetlink sub-components
				are not restricted of using their own
				include files for specific data structures.

2). Netfilter via Netlink C structures.
The goal of the current design of nfnetlink structures is to provide
comprehensive table information that is netfilter component indepenedent.

With that said each nfnetlink sub-component generally requires specific
sub-structures to perform operations that are specificly related to the
components table. Though each sub-component follows a general method of
operation for building structures and accessing structure data.

For all nfnetlink component operations there exists three basic operations
GET, ADD, and DEL. Each operation may provide additional features. GET can
return a single table record or the entire table. ADD may add a single record
or it could replace an existing record. DEL may delete a single record or
multiple records of a table.

Although a degree of freedom is provided to create component netlink message
structures at will, there is one ultimate rule that must be followed.

This rule is: Any information provided to the user via nfnetlink through
a GET, ADD, DEL command must have a structure that the recieved data can
be used for execution in another nfnetlink socket without any modifications
required.

The reason for this is obvious. A user can perform a GET command and then
use the data returned to execute a ADD or DEL command with no modications to
the original data. This allows for features such as propagation for applications
such as failover and backup.

2.1). Netfilter via Netlink structure details.
This section provide details on specific nfnetlink sub-component structures.
The general Netlink structures are not covered, nor are the generic nfnetlink
message structures. Though nfnetlink structures are covered if required to
understand a specific sub-component.

2.1.1). Connection Tracking.
Connection tracking via Netfilter Netlink or Connection Tracking messages
uses the exact structures that are implemented in the ctnetlink version.

This section will be coverd in more detail at a later date.

2.1.2). Iptables.
The Iptables component of nfnetlink (also known as iptnetlink) requires 
some specific consideration due to the nature of iptable operations.

The existing internal kernel table layout does not lend itself well 
to the use of Netlink socket communications.

Current implementation of iptables user-space communcations required the
user to have intimate knowledge of the internal workings of IPtables. Right
now essentially an exact copy of the IPtables kernel table structure is
delivered to the user without any modifications. While this has the single
benefit of conceptually simple kernel-user communications operations it does
not bode well for API simplicity.

In general the current kernel-user communications poses no restrictions to
the iptnetlink. As only a raw API is implemented it provides iptnetlink the 
ability to perform table operations without major modifcations to iptables
kernel facilities, while still providing the user the ability of low-level 
control.

Actual GET, ADD, and DEL structures:

struct iptmsg {
        unsigned char           iptm_family;
        char                    iptm_table[IPT_TABLE_MAXNAMELEN];
        char                    iptm_chain[IPT_FUNCTION_MAXNAMELEN];
        unsigned int            iptm_entry_num;
};

Each iptnetlink message includes the iptmsg header. This header specifies
the IP version, Table name, Chain name and Entry/Rule number that the attached
iptable message is related associated with.

enum iptattr_type_t {
        IPTA_UNSPEC,    /* [none] I don't know (unspecified). */
        IPTA_IP,        /* [ipt_ip] */
        IPTA_NFCACHE,   /* [u_int] */
        IPTA_COUNTERS,  /* [ipt_counters] */
        IPTA_MATCH,     /* [ipta_info] */
        IPTA_TARGET,    /* [ipta_info] */
        IPTA_MAX = IPTA_TARGET
};

struct ipta_info {
        u_int16_t       size;
        char            name[IPT_FUNCTION_MAXNAMELEN];
        unsigned char   data[0];
};

Following the iptmsg header is various entry related information. Not all
attribute fields are required in all messages.

Netlink communcations dictates that each message is independent of any other
message. This does not mean that messages can not be related, but each message
should contain enough information for the user to be able to perform useful
operations on the data.

In the case of iptnetlink this task is somewhat more complicated than other
netlink implementations due to the order dependent nature of iptable entries. 
For this reason each iptmsg header contains the information of table, chain
and entry number. It is up to the user to ensure that the delivered messages
are reassembled into a correct sequence.

Iptnetlink does take care to transmit tables and chains in the correct order
and Netlink does provide sequencing. But the user should not depend on these
assumtions for correct table->chain->entry ordering. 

iptmsg {
  ipt_ip 	ip
  ulong	 	nfcache
  ipt_counters	counters
  ipta_info	match
  ...		...
  ipta_info	match
  ipta_info	target
}

Upon reception the user receives an array of iptnetlink messages, show above.

2.1.3). Firewall Monitor (fwmon).
Not covered at this time by nfnetlink.

2.1.4). User Log (ulog).
Not covered at this time by nfnetlink.

3). User space tools.
Due to the new API provided by the Netfilter via Netlink or Netfilter
messages, it is possible to create user space tools which are much
simpilar then their predecessors.

The iptables2 package is a complete replacement for the current iptables
package. This new software provides all the previous capabilities of the
iptables package. The iptables2 software provides backwards compatable 
command line interface of iptables. New features of iptables2 include
the nf command which is a generic command facility which provides an enhanced
command line interface. Iptables2 makes exclusive use of the Netfilter
messaging.

Goals:
1. Use the Netfilter via Netlink API for all netfilter configuration
   communcation proceedures.
2. Provide a common useful C libraries for use by any application.
3. Provide at minimum the exact same feature set as Iptables.
4. Retain existing Iptables extension API as not to break any user extensions.
5. Create generic tool to configure all Netfilter facilities on the order of
   the Iproute2 software package.
6. Provide backwards compatable command line interface, so Iptables2 can
   be a drop-in replacement in existing scripts using the iptables/etc commands.
7. Ability to provide useful real-time table monitoring.

Iptables2 extra benefits:
1. General amount of user-interface complexity and code reduced greatly.
2. Generic library for 3rd party use that does not require advanced internal
   knowledge of Netfilter, specificly Iptables.
3. Ability to extend user interface for new Netfilter components without
   the need to create new user-space commands, (ie iptables, ip6tables, ippool)
   with retaining the defined interface of a common Netfilter configuration 
   tool.

Iptables2 specific new features (Incomplete):
1. Return rule number to user upon sucessful completion of an Iptables
   append, insert or replace rule command.
2. Ability request notifcation of table, chain, rule modification.
2. The ability to easily perform operations on multiple tables, chains, rules.
3. Ability to efficently save and restore netfilter component table information
   with fine granularity (entry, chain, table, world).

As a side note it is worth mentioning that the direction of iptables2 is to
provide a design that can simply be dropped into the iproute2 package.

So one could replace the nf command in iptables2 with the ip command. It makes
more sense if you look at Appendix A.

4). What is missing from this document.
This document is not compelete. The intention of this document is
to cover the more critical areas of Netfilter messages.

Most things that are not covered are simply implementaion issues such as:
- Changes to iptables. See Apendix B.
- Nfnetlink user-space library functionality.
- Exact iptables2 syntax. See Appendix A.

Some of these things are not covered due to the current volatility of the
item, due to development. Though for completeness some information related
to these topics is attached.

Appendix A). Iptables2 nf help output.
Attached output is going to change and may not event be accurate right now.

[root@snapper nf]# ./nf help
Usage: nf [ OPTIONS ] OBJECT { COMMAND | help }
  OBJECT  := [ tables | conntrack ]
  OPTIONS := [ -v[ersion] | -h[elp] | -f[amily] { inet | inet6 } ]

[root@snapper nf]# ./nf tables help
Usage: nf tables { list | show } TABLE OPTIONS
       nf tables flush TABLE SELECTOR
       nf tables monitor proto CTPROTO SELECTOR
       nf tables get TUPLE
       nf tables { add | insert | append | replace | delete
                 | flush | zero | check | rename | policy } CHAIN OPTIONS
       nf tables { save | restore } TABLE OPTIONS
       nf tables { help }
OPTIONS := [ stats | number | verbose ]
SELECTOR := [ CHAIN ] [ match PREFIX ] [ exact PREFIX ]
            [ proto CTPROTO ]
TABLE := [ filter | nat | mangle | drop ]
TUPLE := [ CTPROTO SRC_IP SRC_PORT DST_IP DST_PORT ]
PREFIX := [ CTPROTO &| FAMILY &| IPADDR &| DEVICE ]
CTPROTO := [ udp | tcp | icmp ]
FAMILY := [ ipv4 | ipv6 ]
IPADDR := [ ip (orig|rply) (src|dst) xxx.xxx.xxx.xxx/X ]
DEVICE := [ in NAME | out NAME ]

[root@snapper nf]# ./nf conntrack help
Usage: nf conntrack { list | show | flush } SELECTOR
       nf conntrack monitor proto CTPROTO SELECTOR
       nf conntrack get TUPLE
       nf conntrack { add | delete } TUPLE
       nf conntrack { help }
SELECTOR := [ match PREFIX ] [ exact PREFIX ]
            [ proto CTPROTO ]
TUPLE := [ CTPROTO SRC_IP SRC_PORT DST_IP DST_PORT ]
PREFIX := [ CTPROTO &| FAMILY &| IPADDR &| DEVICE ]
CTPROTO := [ udp | tcp | icmp ]
FAMILY := [ ipv4 | ipv6 ]
IPADDR := [ ip (orig|rply) (src|dst) xxx.xxx.xxx.xxx/X ]
DEVICE := [ in NAME | out NAME ]

Appendix B). Changes to Iptables kernel.
This is here to relieve any stress caused by the words "Changes to Iptables".

Other than the addition to files in the linux/net/ipv4/netfilter directory
there are no changes that require mentioning. Though if there were they would
be mentioned here. :)
