georg's blog

Articles

2014-11-13T12:24:00+01:00

Selection of texts from various topics I suggest to read.

Vim Regular Expressions

2014-09-30T08:53:00+02:00

This page contains some really cool regular expressions for Vim. I'll extend them to future reference.

Search multiple lines

By default, a regex checks line by line. Some suggest to include line wraps \n\r explicitly (but I've not confirmed if that works). Vim supports the construct \_. to match any character including line breaks. The following Regex:

/@incollection\_.\{-}\(\(booktitle\)\|\(crossref\)\)

searches a BIB file for all @incollection entries and highlights up to the following booktitle or crossref. The `{-} ensures a non-greedy behavior, i.o.w. it only goes unto the shortest possible match (see below).

Source

Find the shortest possible match

Commonly, a regex will match as much characters as possible. Not only in multi-line matches (as shown above), but sometimes also in other situations, a non-greedy behavior is wanted.

A simple approach to match the next character is to exclude that from the pattern between:

/<[^>]*>

But if the regex gets more complex, the special multi token \{-} can be used. It will match the shortest possible pattern.

See :help non-greedy and [Source](multi token \{-} can be used. It will match the shortest possible pattern.

See :help non-greedy and Source

Search for lines NOT containing certain words

/@\(\(article\)\@!\&\(inproceedings\)\@!\&.*\){

I've a BibLatex file and want to check all entries, that are not of the classes @article and @inproceedings (all entry types are spelled in lower case, so that I don't have to deal with that).

If the file contains:

01 @article{a,
02    title={Demo article},
03 }
04 @report{b,
05    title={Demo report},
06 }
07 @inproceedings{c,
08    title={Demo presentation},
09 }
10 @misc{d,
11    title={Demo web reference},
12 }

This search pattern would find the lines containing an @ and not article or inproceedings, but something and than an opening curly brace {. This matches the lines 4 and 10.

Highlight output

2014-07-29T11:55:00+02:00

A previous posting explained how to use colors on the bash prompt. Here is a convenient filter function to highlight words (or regex's) in files or in the output of commands.

# highlight words in output.
# use: cat /etc/passwd | hl root user
# from: http://chneukirchen.org/blog/archive/2013/07/summer-of-scripts-hl.html 
# zsh: -e${^*} expands $*="a b c" to -ea -eb -ec
# ported to bash by me using xargs
function hl () { 
    if [[ $1 = '-i' ]]; then
        ARGS='--ignore-case'
        shift
    fi
    egrep $ARGS --color=always -e '' $(echo $* | xargs -n1 printf "-e%s "); 
}

I have this in my .bash_profile.

The function first checks, if the first parameter is -i to activate an according parameter to egrep. (If that param was found, shift removes it.) The main work is done by egrep. The parameter --color=always activates output coloring even if it is used in a pipe (or the output is written to a file). This is required to get colors forwarded to less or further processing.

In the original by Ch Neukirchen, the zsh can magically prepend parameters with a string. This was ported to bash by using a combination of xargs and printf. Effectively, egrep gets multiple expressions: The empty string results in all lines being printed (not only the lines with a matching). But since the expression is empty, it matches but does not color anything. Then, the arguments of the hl function are listed, each one prepended with -e.

hl root user

becomes

egrep --color=always -e '' -e root -e user

Use

As shown in the comments, this filter can be included in command pipes in the interactive command line:

$ cat /etc/passwd | hl root user

This will highlight the words 'root' and 'user' in the output. To highlight an entire line, use a regex:

$ cat /etc/passwd | hl '.*root.*'

Change color

Multiple instances of hl can be piped together. Since the coloring is done by egrep, that utility can be instructed to use different colors. In a Makefile for a LaTeX project, I have included the following:

check : $(MAIN).pdf
    cat $(MAIN).log \
      | GREP_COLORS='mt=01;33' egrep --color=always --ignore-case -e ''""'' -e'^.*float specifier changed to.*$$' \
      | GREP_COLORS='mt=01;34' egrep --color=always --ignore-case -e ''""'' -e'^.*underful.*$$' \
      | GREP_COLORS='mt=01;32' egrep --color=always --ignore-case -e ''""'' -e'^.*overful.*$$' \
      | GREP_COLORS='mt=01;31' egrep --color=always --ignore-case -e ''""'' -e'^.*warning.*$$' \
      | less

The environment variable GREP_COLORS is set differently for the various instances of egrep. The cluster of quote signs ''""'' is required for the Makefile processing so that the same '' as in the function above remains active (the others are eaten by make and its subshell).

With --color=always, the matching lines are colored and forwarded to the next filter. Finally, a colorful output is presented in less.

The color codes are the same as in the first table of the color introduction.

References

YAML with Python

2014-05-15T10:13:00+02:00

YAML

YAML is a file format definition to represent structured data. Unlike JSON which is derived from JavaScript, YAML is independent (although, both are supported by a wide range languages). It is also well readable by humans.

Python

Example file example.yaml (see YAML homepage or Wikipedia for more)

name: YAML
note: "YAML Ain't Markup Language"
events:
 - {date: 2004-01-29, note: 'Version 1.0'}
 - {date: 2005-01-18, note: 'Version 1.1'}
 - {date: 2009-10-01, note: 'Version 1.2'}

Open this file and parse it into Python:

>>> import yaml
>>> f = open('examle.yaml')
>>> dataMap = safe_load(f)
>>> f.close()
>>> dataMap
{'note': "YAML Ain't Markup Language", 'name': 'YAML', 'events': [{'date':
datetime.date(2004, 1, 29), 'note': 'Version 1.0'}, {'date': datetime.date
(2005, 1, 18), 'note': 'Version 1.1'}, {'date': datetime.date(2009, 10, 1),
'note': 'Version 1.2'}]}
>>> dataMap['name']
'YAML'

This is a Python dictionary. To save a dictionary to a YAML file, use yaml.dump(dataMap, f) (f beeing a file stream opened for writing). The Source gives an example how to convert this dictionary into a Python object:

>>> class MyStruct:
...     def __init__(self, **entries):
...         self.__dict__.update(entries)
...
>>> y = MyStruct(**dataMap)
>>> y.name
'YAML'
>>> y.events
[{'date': datetime.date(2004, 1, 29), 'note': 'Version 1.0'}, {'date': datetime.date(2005, 1, 18),
'note': 'Version 1.1'}, {'date': datetime.date(2009, 10, 1), 'note': 'Version 1.2'}]
>>> y.events[2]
{'date': datetime.date(2009, 10, 1), 'note': 'Version 1.2'}
>>> y.events[2]['note']
'Version 1.2'

References

Stackoverflow

Vim Copy and Paste

2014-04-07T12:16:26+02:00

Command mode commands

The most important commands are y to yank (copy) and p to paste. And not to forget d to delete. The yank and delete commands put text into registers. Without further information, the default register is used.

Command	Meaning
`yy`	copy the current line to the default register
`dd`	delete the current line placing it in the default register
`p`	paste the content of the default register right of or below the cursor
`P`	paste the content of the default register at the position or above the cursor
`"ayy`	yank (the line) into named register `a`
`"ap`	paste content of register `a`

Special registers

The following registers can be used like the named registers a to Z, but they contain special information or have more functions.

Register	Function
`"*`	X11 middle-mouse selection (copy by selecting with the right mouse button and paste with the middle mouse button)
`"+`	X11 clipboard (Edit -> Copy/Paste or Ctrl-X/C/V)
`"/`	last search term
`":`	last vi command (entered after `:` in the bottom line of the editor)

Modern Bash Scripts

2014-02-05T08:58:00+01:00

So far, this is just a short selection of items. I plan to convert this into a more elaborate form and provide examples sooner or later.

General

Bash has now many things built-in, that once required calling external commands. Most notably:

The testing of conditions used [...] which is just an alias for the program test. Just using [[...]] instead lets Bash evaluate the condition itself. This doesn't matter for single if statements, but complex scripts with loops experience a huge speed-up.
Simple calculation and string handling can be done by $(( 1 + I )) (instead of expr 1 + $I) and ${DIR/foo/bar} instead of calling sed.
Regular Expressions are build in (use [[...]])

Small bits and pieces

CamelCase

For an auto-generated TeX file, I want to automatically generate a CamelCase expression from filenames containing dashs and underlines. Many suggestions on the internet propose using sed or other secondary tools, but I tried to get it with Bash Buildins.

Bash has the parameter expansions ${parameter^pattern} and ${parameter,pattern} that convert the first matched character to upper or lower case. By setting the pattern to * or ?, the first letter of the variable $parameter is converted:

$ STR='abc def'
$ echo ${STR^?}
Abc def
$ echo ${STR^*}
Abc def
$ echo ${STR^^*}
ABC DEF

There are variants with two ^^ or ,, that convert every occurence of a matched pattern. But how to convert only the first letter of every word? The solution is to use an array and apply the conversion to each element:

$ ARR=( $STR )
$ echo ${ARR[*]^*}
Abc Def

Back to the initial challenge: convert a string like my-filename_variant1. The approach is to convert the special characters to spaces and then tokenize the character string at word boundaries to an array.

$ TMP='my-filename_variant1'
$ TMP=${TMP//-/ }       # convert all '-' to spaces
$ TMP=${TMP//_/ }       # convert all '_' to spaces
$ TMP=( $TMP )          # tokeninze words to array
$ TMP=${TMP[*]^*}       # upper-case all words
$ TMP=${TMP// /}        # remove spaces
$ echo $TMP
MyFilenameVariant1

There are probably more elegant ways with sed. I'm not sure, if it could be solved with Bash's regular expressions. And finally, I'm also not sure if this is still more efficient than a forked sed, but it is amazing what can be done with Bash alone.

Source: solved this problem with the creative help of Stackoverflow.

Git Analyze

2013-12-28T10:29:00+01:00

Ever found an old copy of a repository and didn't know the purpose or state of if? Was it just a test? Are there modifications that were not pushed anywhere? When was it cloned in the first place? And when was is used the last time?

I didn't found any methods of Git itself or third-party tools to query this information (okay, I didn't search very thoroughly). This little script demonstrates what can be found when digging the .git directory. The script is on GitHub.

Identify a GitHub repository

A GitHub repository carries its meta-information in a .git directory located in its base directory. If the current working directory is a subdirectory, the path must be followed towards the root directory.

DIR=''
ORIG_DIR=$PWD
if [[ -d .git ]]; then
    DIR=$ORIG_DIR
else
    while [[ $PWD != / ]]; do
        cd ..
        if [[ -d .git ]]; then
            DIR=$PWD
            break
        fi
    done
fi
if [[-z $DIR ]]; then
    echo "ERROR: no .git directory found in path '$ORIG_DIR'"
    exit
fi

These lines check if there is a .git directory in the current working directory and if it can not be found, it steps upwards until either the root directory is reached or a .git directory is found. If the loop is left by the while condition, the variable DIR is still empty and an error message is printed.

This code changes to the base directory of the Git repository. Since it is a script with its own scope, it does not need to store the original directory to restore it later. The calling environment is not changed.

Age of the repo

The age of the repo can be derived from the oldest file in the .git directory. This is probably not the most stable algorithm. Later, the logs directory is explained that holds a better source for this information.

# try to derive the age (date of init or clone) from .git files
# (use oldest file in .git directory)
echo -n "init'ed or clone'd most probably on: "
stat -c '%Y %y %n' $DIR/.git/* | sort | head -n1 | awk '{print  $2 " " $3  " (" $5 ")" }'

The stat utility is given a format string to print the modification date as seconds since the UNIX epoch, in a readable format and the file name. This list is sorted, the first line is extracted and the columns 2 and 3 (date, time) and 5 (filename) are printed.

Simple information by `git` tools

A basic information is the .git/description that can be set for every repository. It appears not to be used by the git tools, but might be read by other tools (GitWeb) or hooks. If the file exists and does not contain the default ("Unnamed repository..."), its content is printed.

Remote links

The remote links are printed by git remote -v, but the lines are annotated with (fetch) and (push). To just see the links, the second column is cut out and sorted and unified:

echo -n "Remote links: "
git remote -v | cut  -f2 | cut -d' ' -f1 | sort | uniq

SVN connection

The command git svn info should reveal any ties to a Subversion repository. If the command gives an error, the output is suppressed:

SVNINFO=$(git svn info 2>&1)
if [[ ! "$SVNINFO" =~ ^Unable ]]; then
    echo -n "Git-SVN info: " $SVNINFO
fi

Last commit

The commit log can be flexibly formated with git log. The script uses:

echo -n "Last commit: "
git --no-pager log --all -n1 --format="${COLGITHASH}%h ${COLGITDATE}%ci${COLGITRESET}%d ${COLGITSUBJECT}%s${COLGITRESET}"

The variables $COLGIT* contain the git log format %C(...) as documented in its man-pages. The can be set empty to suppress color (see the final script linked above for the whole picture).

Git logs

The directory .git/logs holds files with a history log for various objects. The HEAD contains the initialization or cloning, any pushes, fetches and pulls and commits and checkouts.

The first line gives the information when the repo was creates by initialization or cloning. In the latter case, the clone source is given. Depending on verbosity, the script prints either the first and last entry or the full history. The tokenization of the lines is a bit tricky because after two hashes, the name can be one or multiple words. The E-Mail address is enclosed in angle brackets. Then follows a UNIX time stamp (seconds since epoch), the time zone and a description of the action.

function tokenize_log()
{
    read REV0 REV1 REST < <(echo "$@")           # prev. and current revision sha1, remainder
    echo_v3 "REV0   = '$REV0'"                   # echo_v3 prints only on verbosity >= 3
    echo_v3 "REV1   = '$REV1'"
    NAME=${REST%% <*}                            # Name is up to first angle bracket
    echo_v3 "NAME   = '$NAME'"
    REST=${REST##* <}                            # Remainder is after first bracket
    MAIL=${REST%%>*}                             # Mail is up to closing angle bracket
    echo_v3 "MAIL   = '$MAIL'"
    REST=${REST##*> }                            # Remainder is after angle bracket
    read TIME ZONE ACTION < <(echo $REST )       # Time, Zone, Action (multiple words)
    echo_v3 "TIME   = '$TIME'"
    echo_v3 "ZONE   = '$ZONE'"
    DATE=$(date -d@$TIME  +'%Y-%m-%d %H:%M:%S')  # convert UNIX time stamp into readable date
    echo_v3 "ACTION = '$ACTION'"
    if [[ -z $ACTION ]]; then
        ACTION="git init"                        # if no action: it was a `git init`
    fi
}

if [[ $PARAM_VERBOSE -ge 1 ]]; then
    # full history: loop over all lines of .git/refs/heads/master
    echo -e "History of master"
    while read LINE; do
        # function fills global variables $REV0, $REV1, ..., $ACTION, $DATE, $NAME, $MAIL
        tokenize_log "$LINE"
        # print in a convenient format
        echo -e "   $ACTION on $DATE by $NAME $MAIL"
    done < <(cat $DIR/.git/logs/refs/heads/master)
else
    # print only the first and last line of .git/logs/HEAD
    tokenize_log $(head -n1 $DIR/.git/logs/HEAD)
    echo -e "Source of this Repo: $ACTION on $DATE"

    tokenize_log $(tail -n1 $DIR/.git/logs/HEAD)
    echo -e "Last action: $ACTION on $DATE by $NAME $MAIL"
fi

The function tokenize_log receives a line of the logfile and fills the global variables. The echo_v3 function only prints the debug output, if the log level (verbosity) is above or equal to three.

The same output is created for all remotes:

for D in $DIR/.git/logs/refs/remotes/*; do
    REMOTE=${D##*/}                         # extract last part of path
    # get name of server
    REMSERVER=$(git remote -v | grep $REMOTE | cut  -f2 | cut -d' ' -f1 | sort | uniq)
    echo -e "History of $REMOTE ($REMSERVER)"
    if [[ $PARAM_VERBOSE -ge 1 ]]; then
        # full history
        while read LINE; do
            tokenize_log "$LINE"
            echo -e "   $ACTION on $DATE by $NAME $MAIL"
        done < <(cat $D/master)
    else
        # only first and last entry
        tokenize_log $(head -n1 $D/master)
        echo -e "  First action: $ACTION on $DATE"

        tokenize_log $(tail -n1 $D/master)
        echo -e "  Last action: $ACTION on $DATE by $NAME $MAIL"
    fi
done

Inline Assembler

2013-12-17T13:29:00+01:00

Assembly instructions can be embedded in C code with the asm keyword:

asm ("mov ...");

Multiple lines must be separated by newlines:

asm ("mov ...\n"
     "add ...\n"
     "...");

The assembly instructions must be provided either in Intel or AT&T syntax which will be explained below. Some compilers or C standards require the keyword __asm or __asm__ (two underline characters). The keyword volatile will instruct the compiler not to change or remove an inline assembly block for example to prohibit the compiler from removing an empty loop:

for (int i = 0; i < 100000; i++) {
    asm volatile ("nop");
}

Under some circumstances, it is possible to use C variables in the assembly code (since the labels are known to both the compiler and assembler).

GCC

The GNU compiler collection does not understand the embedded assembly and just prints it to the assembly code it generates and that is further translated to machine instructions by the GNU assembler. Since GCC uses the AT&T syntax (probably since it is easier to machine-generate), the embedded code must be in that syntax. However, the compiler (and subsequently, the assembler) can be configured to use and understand the Intel syntax:

$ gcc -masm=intel

This instructs the compiler to generate Intel syntax assembly and consequently also calls the assembler with that parameter. Now, the inline assembly can be in the (to me) more natural Intel syntax.

Extended Inline Assembly

To help the GCC to correctly embed the piece of assembly into its own generated code, it must be provided with some information about the assembly:

input: values to load into registers before the execution of this block
output: where should the register values go after the execution
clobbered: other registers, that change their content so that they must be preserved (push on the stack before and restore after execution)

The complete syntax is:

asm ("mov ..." : /*output*/ : /*input*/ : /*clobbered*/ );

The output and input registers are provided with the information what goes where. In the following example, =c means: "the new value of ECX", a means: "load EAX with ...". The full list of those modifiers is in the documentation linked below.

uint32_t func = 1;
uint32_t reg_ecx, reg_edx;
asm volatile ( "cpuid" 
                : "=c"(reg_ecx), "=d"(reg_edx)  /* output: ECX and EDX */
                : "a"(func)                     /* input: func into EAX */
                : "ebx" );                      /* also modified: EBX */

References

Intel syntax

The Intel Syntax the natural one (used in most textbooks and the processor manuals from Intel and AMD). The major property is the target, source sequence (like in the assignment of a variable in C):

MOV eax, ebx        ; EAX := EBX
ADD ebx, 10         ; EBX += 10

Example program:

#include <stdio.h>

int x = 5;
int main(int argc, char **argv) {

  printf("x = %u\n", x);
  asm (
   "mov eax, x"  "\r\n"
   "mov ecx, eax" "\r\n"
   "add ecx, eax" "\r\n"
   "mov x, ecx"  "\r\n"
   ::: "eax", "ecx"
  );
  printf("x = %u\n", x);

  return 0;
}

This program compiles with gcc -masm=intel. If using gcc -S -o - -masm=intel filename.c, the assembly generated by the compiler is printed in Intel syntax to the screen.

Theoretically, both syntaxes can be mixed by using the assembly keyword .intel_syntax noprefix and .att_syntax noprefix. But when doing so, don't forget to restore the AT&T setting at the end of every inline assembly block.

AT&T syntax

The GCC uses this syntax by default. It has the sequence source, target. Further, the instructions are postfixed with a modifier indicating the bit-width (here: l for long=32 bit) and constant numbers must be prepended with a $ sign.

movl %ebx, %eax     # EAX -> EBX
addl $10, %ebx      # (EBX + 10) -> EBX

Git-Subversion

2013-12-17T10:56:00+01:00

This article shows how to import a Subversion repository to Git.

Import Subversion Repository

based on a migration tutorial

Prepare a file of users (Edit the lines to match uname = Firstname Lastname <email@example.com>)

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
$ vim authors-transform.txt

Checkout the Subversion repository
```
$ git svn clone https://svn.example.com/svn/baserepo/some/subdirectory --no-metadata -A ../authors-transform.txt
```
- --no-metadata : if that's a one-time import, this waives the linkage of Git Commits to Subversion Revisions. If you want to update the Git Repo after further Subversion Commits, don't use this parameter!
- -A ../authors-transform.txt : used to translate Subversion usernames to Git's Name/E-Mail commit entries.
This uses the previously created authors file to convert Subversion accounts to names and E-Mail addresses used in Git commits.

If you have set svn:ignore properties, convert them to .gitignore

$ git svn show-ignore > .gitignore
$ git add .gitignore
$ git commit -m "Convert svn:ignore properties to .gitignore"

Bring the Repository to a Server. Here are two alternatives.
1. Create an attatched Git repository that can pull in (and merge?) later changes from Subversion (untested so far). Don't use the --no-metadata parameter to git svn clone (Step 2) in this case!
  - Create a repository on a server (or locally) to push to
```
$ ssh git@www.example.com
$ mkdir -p git/MyNewRepo.git
$ cd git/MyNewRepo.git
$ git init --bare
$ exit
```
  - Then add the new repository to the Subversion checkout
```
$ git remote add bare git@www.example.com:git/MyNewRepo.git
$ git config remote.bare.push 'refs/remotes/*:refs/heads/*'
$ git push bare
$ git push bare master
```
  The refspec in the second command configures a git push bare (without explicit branch) to always push all remote branches to this remote repo.
2. This proceeding is to create a detatched once-only import to move a repository from Subversion over to Git. This uses a temporary Git repo (called git-svn) that pushes to a server. By doing so, the connection to Subversion gets lost (this can most probably also be achieved by other ways...).
  - The new repo was created on Github and cloned to the local hard disk. This create a clean connection between the local clone and the upstream repo.
  - Then I had two Git repos (and the old Subversion): The new.git is a clone of the Github repo and is correctly tied to that one (for pushing).
```
checkout/$ ls
new.git     old.git-svn      old.svn
```
  - In the Git-Svn repo, add the new.git as remote repo with the name myclone:
```
checkout/old.git-svn/$ git remote add myclone ../new.git
checkout/old.git-svn/$ git config remote.myclone.push 'refs/remotes/*:refs/heads/*'
checkout/old.git-svn/$ git push myclone
```
  - This results in new.git having the new branch git-svn that can be merged to the master branch:
```
checkout/new.git/$ git branch
 git-svn
*master
checkout/new.git/$ git checkout git-svn
checkout/new.git/$ ls
checkout/new.git/$ git checkout master
checkout/new.git/$ git merge git-svn
checkout/new.git/$ git branch -d git-svn
```
  - Check everything, and if it's fine, push the new master branch to the Server.
There are more steps to follow in the original tutorial, especially if using the default Subversion layout with trunk, branches and tags. Check if everything works, then you can delete the git-svn checkout.

Parsing Command-Line Parameters

2013-12-10T19:45:00+01:00

Basic

The parameters are in the variables $1, $2, etc. The special variable $# contains the number of parameters. Further, $0 contains the name of the script how it was called by the user.

If there are more than 9 parameters or to process the parameters one by one, the build-in command shift moves all parameters to the next lower variable.

param1.sh

#!/bin/bash
echo $1 $2 $3
shift
echo $1 $2 $3

executed with parameters "a", "b", and "c":

$ ./param1.sh a b c
a b c
b c

This can be used for simple processing of multiple input options:

param2.sh

echo $@
while [[ $# -gt 0 ]]; do
    echo $1
    shift
done

The first line prints all parameters (another special variable), then the while loop prints $1 and shifts the remaining parameters until the number of parameters $# is 0.

$ ./param2.sh a b c
a b c
a
b
c

This can be extended with if or case to check for actual parameters (examples for that are in the following examples).

The major advantage of the tools presented in the following are their normalizing capabilities. If multiple switches (parameters without argument) are given together as -abc, the normalizing rewrites them to -a -b -c.

Getopt

getopt is a binary that parses the parameters and creates a unified (normalized) string that can be parsed. The line eval set -- "$ARGS" sets the parameters of the script to the normalized version processed by getopt.

param3.sh

ARGS=$(getopt -o 'n:v::h' -l 'help' -- "$@")   # parse parameters and store normalized string in $ARGS
eval set -- "$ARGS";                           # set parameters to preprocessed string $ARGS

while [[ $# -gt 0 ]]; do
    echo -n "[$1] "
    case "$1" in
        -n)
            echo "found parameter -n"
            echo "     required parameter is '$2'"
            shift # remove required parameter
            ;;
        -v)
            echo "that's a -v"
            if [[ -n $2 ]]; then
                echo "     optional parameter '$2'"
            else
                echo "     no optional parameter provided"
            fi
            shift # remove optional parameter 
                  # (was inserted as /empty/ if not provided by user
            ;;
        -h|--help)
            echo "someone asked for help with -h or --help"
            ;;
        --)
            echo "that was the last option, following are free parameters"
            ;;
        *)
            echo "thats an unknown parameter"
    esac
    shift
done

The optstring (after -o) defines the valid parameters. Like in the C library function getopt(), the parameters can have an optional argument (two colons) or a required one (one colon). In the example above, the -n must be given an argument, the -v has an optional argument. The -h and --help never expect an argument.

$ ./param3.sh -n3 -v -v3 -h --help
[-n] found parameter -n
     required parameter is '3'
[-v] that's a -v
     no optional parameter provided
[-v] that's a -v
     optional parameter '3'
[-h] someone asked for help with -h or --help
[--help] someone asked for help with -h or --help
[--] that was the last option, following are free parameters

See the man-page to getopt for more details.

Getopts

This is a build-in of Bash. And as such, it is somehow easier and more straightforward to use as it can directly be used in the while loop:

param4.sh

while getopts n:v::h PARAM; do
    echo -n "[$PARAM] "
    case "$PARAM" in
        n)
            echo "found parameter -n"
            echo "     required parameter is '$OPTARG'"
            ;;
        v)
            echo "that's a -v"
            if [[ -n $OPTARG ]]; then
                echo "     optional parameter '$OPTARG'"
            else
                echo "     no optional parameter provided"
            fi
            ;;
        h)
            echo "someone asked for help with -h"
            ;;
    esac
done

# additional free arguments:
shift $(( OPTIND - 1))
echo "more parameters: '$@'"

The changes are: * getopts is directly put in the while loop and puts its result in $PARAM. * It does not support long options (e.g. --help) * The cases are only the letter while getopt included the dash. * Additional non-option (free) arguments remain unread and can be processed later (after shifting all processed parameters).

Bibgrep and Texgrep

2013-12-10T13:08:00+01:00

If you enter a command more than twice, put it in a script.

I don't remember who said this, but it is a good rule of thumb for administrators and essentially everybody working on a shell. In fact, that is one of the points, why I love working with Linux.

I found myself grep'ing again and again in the same files, so I optimized the workflow. Both scripts are in my Scripts repository on Github.

bibgrep

bibgrep started as a tool just to search in my BibLaTeX files. But as grep usually only displays the matching line (or a constant number of surrounding lines), I ended up repeatedly opening the found position(s) in an editor. Therefore, bibgrep uses sed to display whole entries up to the closing brace (that sits always on its own line in my bib files). With some coloring, the output is better readable.

The most recent improvement is storing the identified BIB key in the X11 clipboard (using xsel). This allows me to directly paste it in the editor or to use texgrep subsequently.

texgrep

texgrep is hardly more than a one-liner, but if no parameters are provided, it uses the X11 clipboard as search term. In my workflow, I use bibgrep to find an entry (or check if it exists) and then use texgrep to find all places where it is cited:

$ bibgrep Kato
../../bib/linux.bib:120
@inproceedings{Kato2010Airs,
  [...]
}
../../bib/linux.bib:344
@inproceedings{Kato2008Modular,
  [...]
}
$ bibgrep Kato2010
../../bib/linux.bib:120
@inproceedings{Kato2010Airs,
  [...]
}
$ texgrep
TERM='Kato2010Airs'
chap01b_related.tex:366:\citetitle{Kato2010Airs}

The first search for "Kato" found two entries, therefore the search was repeated with the year "Kato2010". This is unique and a following texgrep searches for the previously identified key "Kato2010Airs" and displays that this BIB entry is cited in one TEX file (it displays also the line).

The next improvement would be to store this info and somehow direct my vim instanct to jump to that line...

Vim Spell Checking

2013-12-03T08:21:14+01:00

Build-In Spell Checker

Activate for the current buffer (optionally set the language):

:setlocal spell
:setlocal spell spelllang=en_us

Source: Vim Spell-Checking (there is help how to automatically turn spellchecking on for selected file types)

Command	Meaning
`]s`	next marked word
`[s'	previous marked word
`z=`	show a list of proposed words
`zg`	add marked word to the list of correct words
'zG`	add marked word to a temporary list of correct words

Language Tool

Java program for spell-checking (not tested, yet)

Vim Gems

2013-12-03T08:14:02+01:00

Small collections of tricky hacks.

Insert date- and timestamp

:nnoremap <F5> "=strftime("%c")<CR>P
:inoremap <F5> <C-R>=strftime("%c")<CR>
:iab <expr> DTS strftime("%c")

The first line is for the Normal mode. It places the time stamp in a register and pastes it at the cursor position when pressing F5. The second line is for the Insert mode (again on hitting F5) and the third line generates an expression that replaces the characters "DTS" to a timestamp. The selection of that expression is probably not the best idea, because while typing this text, I got the timestamp instead of the three letters D, T, S.

Source: Insert current date or time

Vim Configuration and Settings

2013-12-03T08:11:45+01:00

So far, this is a collection of bits and pieces about configuring Vim.

Basics

The options can be set and read like variables with the ex command :set. A single :set shows all options that differ from their default value. An actual setting can be inspected with :set {option}?. There are toggle options, string and number values. Toggle options are set with :set {toggle} and reset with :set no{toggle}. The other options are set with :set {option}={value}. Options that are a list of flag characters can be modified by adding or removing dedicated flags with :set {flags}+=a or :set {flags}-=b (to avoid overwriting the whole set of flags). Many other operations are possible (e.g. inverting a toggle option), refer to the help page. With :verbose set {option}?, vim displays where an option was set.

There are global settings and local settings only for the current buffer

:set
:setlocal
:setglobal

See: :help options

Config files

The main config file is $HOME/.vimrc. If the option exrc is set, vim also reads an existing file .vimrc in the working directory. It can be used to override some settings on a per-project base.

See: :help vimrc

Modelines

Modelines are per-file settings. By default, vim checks the first and last 5 lines for a modeline. It usually begins with a comment sign (depending on the programming language e.g. // for C, # for Bash and Python, % for LaTeX). Then follows the marker vim:. Only the set command is supported to avoid hostile text files manipulating your editor and system.

Two forms are supported:

// vim: sw=3 ts=6
/* vim: set sw=3 ts=6 : */

The first form lists only the options to set separated either by space or colon. The second form begins with set and ends with a colon that can be followed by other text, for example a comment terminator.

See: :help modeline

Bash Debugger

2013-11-22T07:45:00+01:00

Build-in Debugging Features

The simplest way is to activate debugging output with the -x command line option or in the script with set -x. This will print every line to stderr before it is executed. In those lines, the variables are already replaced with their values, therefore it's easier to see what's happening. Combined with the option -v that prints each line as it was read from the script, a malfunction can often be traced back to the line where it happened and the variables can be analyzed.

The fourth prompt PS4 allows to set the marker for -x tracing lines:

export PS4='+${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: '

or, extended with colors:

export PS4='+\[\033[1;33m\]${BASH_SOURCE}\[\033[0m\]:\[\033[0;31m\]${LINENO}\[\033[0m\]:\[\033[1;34m\]${FUNCNAME[0]}\[\033[0m\]: '

If you're debugging a larger script, you can activate tracing for only parts of the script:

N=4
set -x      # start tracing
N=$(( N * 2 ))
set +x      # stop tracing
echo $N

BashDB

The BashDB is a wrapper that executes shell scripts with debugging features such as breakpoints, inspecting variables and changing the script under test.

Colorful Prompts

2013-11-22T07:45:00+01:00

Color can be set in the shell with control sequences starting with \033 and ending with m. The echo command must be given the parameter -e to interpret the control sequences:

echo -e 'Hello \033[1;31mworld\033[0m!'

The 033 is octal for 27 (decimal) which is the ASCII code for escape.

Text color

Sequence	Color
`\033[0;30m`	black
`\033[1;30m`	dark gray
`\033[0;31m`	red
`\033[1;31m`	light red
`\033[0;32m`	green
`\033[1;32m`	light green
`\033[0;33m`	brown
`\033[1;33m`	yellow
`\033[0;34m`	blue
`\033[1;34m`	light blue
`\033[0;35m`	violet
`\033[1;35m`	light violet
`\033[0;36m`	teal
`\033[1;36m`	cyan
`\033[0;37m`	silver
`\033[1;37m`	white
`\033[0m`	default

Background color

Sequence	Color
`\033[0m`	default (no background)
`\033[40;m`	black background
`\033[41;m`	red background
`\033[42;m`	green background
`\033[43;m`	brown background
`\033[44;m`	blue background
`\033[45;m`	violet background
`\033[46;m`	teal background
`\033[47;m`	silver background

Example script

for FG in 30 31 32 33 34 35 35 36 37; do
    for LI in 0 1; do
        echo -en "\033[${LI};${FG}m${LI};${FG} "
        for BG in 40 41 42 43 44 45 46 47; do
            echo -en "\033[${BG}m ${BG} "
        done
        echo -e "\033[0m"
    done
done

Text and background color can be set in one control sequence:

echo -e "Print \033[44;1;33myellow on blue\033[0m"

Prompts

To use colors in prompts, just use the same codes, but enclose them in \[ and \]. These markers are required to avoid misinterpretations -- they start and end a sequence of non-printable control sequences -- probably for shells not supporting the color settings.

export PS1='\[\033[1;34m\]\u\[\033[0m\]@\[\033[0;32m\]\h\[\033[0m\]:\[\033[1;33m\]\w\[\033[0m\]$ '

This example changes the color to light blue, prints the username \u, sets the color back to default, print the character "@", changes to green, prints the hostname \h, then a colon in the default color, the working directory \w in yellow and a dollar sign in the default color.

Valuable sources:

archlinux: Color Bash Prompt

Prompts

2013-11-22T07:45:00+01:00

The Bash supports four different prompts:

PS1 is the usual prompt when interactively entering commands
PS2 is used, when a command spans multiple lines (e.g. if a string is still open when you press ENTER or after a Backslash)
PS3 is used for shell menues with select
PS4 is prepended for execution traces (see Bash Debugging)

The Prompts PS1, PS2 and PS4 support placeholders that are replaced with current information such as the hostname, the username or the working directory (the full list is in the Bash man page and documentation).

Symbol	Description
`\h`	Hostname
`\u`	User name
`\w`	Current working directory
`\@`	The character "@"
`\$`	For root "#", otherwise "$"

An example prompt:

PS1='\u@\h:\w$ '

This prints the username, "@" hostname, a colon, the working directory and a dollar sign (followed by a space).

Example for a colored prompt:

export PS1='\[\033[1;34m\]\u\[\033[0m\]@\[\033[0;32m\]\h\[\033[0m\]:\[\033[1;33m\]\w\[\033[0m\]$ '

Encoding

2013-11-20T10:25:00+01:00

Today, I was asked for help with encoding problems with vim. Since I know that my colleage uses Windows on his desktop computer and uses a SSH connection to a Linux box, we first made sure, that the shell-terminal connection worked properly.

Files and network connections are a byte stream. But both the terminal and the shell use an internal string representation. With unicode, a single character can be made up of one to four bytes.

Shell - Terminal

In the shell, the encoding can be set with the environment variable LANG (or variants of LC_*). On my German language system, I use LANG=de_DE.UTF-8. Both the shell and the SSH client must be configured similarly.

If they mismatch, funny things can be watched. When typing an Umlaut (some special character that is not in basic ASCII, like the German "ö"), it appears on the screen, but removing it requires typing backspace two times. Or the Umlaut does not appear immediately, but after another key is pressed. This can be explained with the conversion of characters to a byte stream both for sending the entered key to the shell and for receiving the new sceen content.

If the shell uses latin1 and terminal is configured to UTF-8, the Umlaut is send as two byte code to the shell which interprets the two bytes as two strange symbols and displays them. But the terminal interprets those two bytes again as Umlaut. So far, everything looks fine. But when backspace is pressed, the shell removes one of the stange symbols and the terminal misses the second byte of the two-byte Umlaut code. Only after a second press of Backspace, all Bytes of the Umlaut are gone and the character string is clean again.

In the opposite case, if the shell uses utf-8 and the SSH client uses latin1, the Umlaut is sent as a number between 128 and 255 over the network. Those numbers alone are invalid UTF-8 codes because they indicate a multi-byte character. After another key press on the terminal, the additional byte can complete the UTF-8 sequence (but it may happen that it requires additional bytes). For the shell, this is a two-byte character (and it can only be removed together), but the SSH client will happily display the two original latin1 characters.

Applications

The next step is to ensure, that the application works correctly together with the shell. I made the experience, that vim usually works correctly this way. But it may misinterpret the file format. Again, vim reads a stream of bytes from a file and manages them as characters. Those are printed to the screen with the encoding that the shell uses.

In vim, the setting :set fileencoding can be used to change how a file is interpreted. To be continued...

Git

2013-11-14T16:11:00+01:00

Git is the distributed version control system that was developed for the Linux kernel (started by Linux Torvalds in 2005). Today, it is widely used in the open source development community.

At work, I got in contact with Subversion and I really enjoy using it for all text-based projects (programming, LaTeX, you name it). However, the Linux kernel is managed with git and in recent time, many projects switched to Git. So I thought it was time to learn its basics. And while learning, I maintain my cheat sheet here. It is not intended to be a stand-alone tutorial, there are many good ones available (e.g. the Book on the Git homepage, that was also translated in a number of languages). So this is merely a short reference. Further, my own view is from an experienced Suversion user, so I base my notes on previous knowledge about source code revision systems.

While Subversion is a server-based version control system (the central repository is a server and all interaction is done with this server), Git is a distributed version control system. That means that every user has a local repository (therefore "distributed") that stores all revisions. These local repos can be synchronized by pushing the changes to a server or by pulling changes from someone into the local repo.

A git server can be any system accessible via network either with SSH or HTTP(s). Github provides free Git hosting. For the following tests, I use my test repository.

Configuration

Every tutorial starts with the following lines to configure your name and E-Mail:

$ git config --global user.name "Your Name Comes Here"
$ git config --global user.email you@yourdomain.example.com

So it appears to be really advisable to do this. The background is that Git stores the name and an E-Mail address with every commit. Unlike Subversion, where every user needs an account on the server, Git cryptographically hashes all commits and everybody (or every alias) can create commits. The --global parameter adds the settings to the config file in your home directory. If you want to contribute to a project with a different identity (e.g. your company E-Mail instead of your private one), the settings can also be done only for the current project by omitting this option. The global config file is ~/.gitconfig and the project-one is .git/config.

Please refer to the documentation (man git-config and git --help) for further information. The command git config -l lists the current settings (if multiple entries exist, the last one overwrites previous ones).

Activate colors (which are deactivated by default) with:

$ git config --global color.diff auto
$ git config --global color.status auto
$ git config --global color.branch auto

Initialization

A new repository is initialized by entering

$ git init

in the base directory. If that's an existing project, you can add all files with

$ git add *
$ git commit -m "initial commit"

Instead of git add *, you want probably add only selected files, such as *.c etc.

It's also possible to clone an existing repository (like a Subversion checkout):

$ git clone https://github.com/georgwassen/HelloWorld

The URL is provided by the project that offers the repository.

First steps

Files, that should be commited to the repository must be added (called "staging"):

$ git add main.c
$ git add Makefile

Check the changes:

$ git status
$ git show

Commit the staged changes (to the local repository) providing a change-log message:

$ git commit -m "initially adding my files"

If you don't provide the -m "message" parameter, Git will open an editor and ask for the commit message.

Now, further modifications can be done. Unlike Subversion, registered files are not automatically included in the next commit. With Git, the files must be added again:

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   main.c
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       hello
#       main.o
no changes added to commit (use "git add" and/or "git commit -a")
$ git add main.c
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       modified:   main.c
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       hello
#       main.o
$ git commit -m "added Bye message"
[master 6c8bb69] added Bye message
 1 file changed, 1 insertion(+)

The cycle is:

Make (and test) changes. Prefer small and related changes and commit them with meaningful messages.
Add changed files: git add file.c
Check state and if you missed a file: git status
Commit changes to local repository: git commit -m "message"

So far, all work is done in the master branch.

Upload to a server

As explained in the Github help, a local repository can be pushed to a server with the command

$ git remote add origin git@github.com:georgwassen/HelloWorld.git
$ git push origin master

(Using the URL for my test repos with SSH access.)

Later, or if the local repository was cloned from a server, it suffices to issue:

$ git push origin master

(`origin´ is the name of the server and ´master´ is the branch to push)

The full power of git

It is possible to synchronize the local repository with multiple servers.

Get the changes from a server:

$ git remote add upstream https://github.com/octocat/Spoon-Knife.git
$ git fetch upstream
$ git merge upstream/master

The strongest feature of Git is said to be branching and merging. Create a new branch and switch to it (there should be no open changes, i.e. a clean working copy):

$ git branch newfeature
$ git checkout newfeature

With git branch, all availabel branches are displayed. Now, you can switch between the branches and commit changes.

To merge changes from one branch to the other (for example, to merge the changes developed in a branch back to master), just call

$ git checkout master
$ git merge newfeature

A merged branch can be removed with git branch -d newfeature.

Branches in Git are very light-weight and fast, so they can be used to keep separate issues apart and merge them when they work. If there is a merge conflict (e.g. if both branches changed the same line), they conflict is marked and reported. You need to clean up the conflicts manually and then add the conflicting files to the staging area. When all conflicts are resolved, commit the staged files and the pending merge will be completed.

To get an overview of the current branch and how it's composed of commits and merges, the visual tool gitk is a great help. If you start it out of a Git repository, it displays the log of the current branch. With gitk --all, it displays all existing branches which helps to remember what branches are pending for a merge.

Creating a server repository

Git supports four protocols to interact with remote repositories: * File: for local repos in other directories or network drives (e.g. NFS) * SSH: encrypted transfer for retrieving and uploading * HTTP/HTTPS: only for download (easy publishing of source code) * GIT: only for download

Refer to the documentation for benefits and disadvantages and for details how to set these protocols up.

To create a personal repository on a Linux server accessible with SSH on the internet, I followed these steps:

Create a user dedicated for Git (or use your personal account). The commits are already tagged with name and E-Mail, the git user is used to determine read-only or read/write access on the repo. I copied the ~/.ssh/authorized_users from another account to enable private/public-key-logins.
```
$ ssh www.example.com
$ su -
# adduser -m git
# su git
$ mkdir .ssh
$ cp authorized_keys ~/.ssh
```
Now, I can login to the server with ssh git@www.example.com without being asked for a password. Note, that every user that should access the Git repository via SSH protocol, must provice a SSH public key and subsequently can also log-in on the server! (The git user can be configured to have no shell to disallow logging in, but that's out of the scope of this article.)
Create a Git repository. Usually, a server repo should be a bare one where no working copy exists. On my server, an old version of Git does not know the parameter, but this is how it should work:
```
$ mkdir -p git/HelloWorld.git
$ cd git/HelloWorld.git
$ git init --bare
```
Now, the new repository has the Git URL git@www.example.com:git/HelloWorld.git. It is a convention to name bare repositories with a .git extension. When cloning such a repository, the local directory is named without this extension by default.
Add the new upstream repository to your local Git repository (where the working copy lives):
```
$ git remote add myrepo git@www.example.com:git/HelloWorld.git
$ git push myrepo master
```
With git branch -a or gitk, you can see, that two remote repositories are listed now.

Tips and Tricks

Amend to a commit

If you see the typo in the commit message shortly after hitting ENTER or forgot to compile the change and committed an error, the last commit can be updated or amended. Use git commit --amend to edit the last commit message or add and commit an additional file (or a file again) to add that change to the last commit.

Backdated Branch

Ever started commiting changes and then determined that this should better have gone into a branch? It's easy to move the last N commits into a new branch:

create the new branch (it contains all commits in the current state, thus also the ones that should be moved into it).
reset the current branch (not the new one) to remove the additional commits.

checkout the new branch and continue.

$ git branch newbranch
$ git reset --hard HEAD~3
$ git checkout newbranch

Again: the reset is done on the master (or previous) branch where the last changes should be removed. The commits are preserved in the new branch.

Source: Stackoverflow

Stashing

When changing between branches, the repository should be in a clean state. If you have modifications that are not yet ready for a commit but you need to change to another branch (e.g. for a hot fix), you can stash the pending changes. That's similar to a commit, but temporary.

$ git stash [save]
$ git checkout otherbranch
# do some modifications, e.g. to fix an error
$ git commit -am "hotfix..."
$ git checkout firstbranch
$ git stash pop

The stashed changes can be applied to another branch or on the branch where you stashed them. The stashing mechanism uses a stack of stashes. The command git stash list shows the stack of stashes and every line gives hint in which branch it was created. See the book and the man-page git-stash for more details.

Import from Subversion

This part was moved to a dedicated article.

Display current branch in the Bash prompt

The bash-completion package contains a function that supports the display of the current Git branch in the Bash prompt.

Check, if the function __git_ps1 exists:
```
$ type __git_ps1
```
If it displays a long function, proceed with step 2, otherwise, try the following:
- Install with the package manager, look for packages like bash-completion, git-extras etc.
- On Fedora, I found the file /usr/share/git-core/contrib/completion/git-prompt.sh and copied it to etc/bash_completion.d/.
- Google for that function and add it either to your local .bash_profile or to the global /etc/bash_completion.d.

Include $(__git_ps1) in your PS1 definition. Example (with Colors):

export PS1='\[\033[01;32m\]\u@\h\[\033[01;34m\] \w\[\033[01;33m\]$(__git_ps1)\[\033[01;34m\] \$\[\033[00m\] '

Push into Working Repository

Usually, one should only push into a bare server repository. But sometimes, it happens that I clone a working repository from one PC (e.g. to my Notebook) and later push the changes back to continue working on the PC. When pushing to a working repository, the .git is updated, but the checked out branch is not modified (it's a bit like the difference between fetch and pull).

Now, the checked out working copy is out of sync with the .git repository backend data. This can be fixed with the simple command:

git checkout -f HEAD

Warning: the current branch should be clean because that command will overwrite modifications.

References

I try to continue this little cheat sheet. But there's already a load of great tutorials on the 'net.

Free Book: Pro Git
Git Tutorial
Crash Course for Subversion Users Git - SVN Crash Course
Video Tutorials
- Github Webcast: The Basics of Git and GitHub (51m) Good first intro with history, background and basic usage.
- Scott Chacon: Introduction to Git (1h 22m) Introduction: He talks very fast and one should already have understood the basics. But then it is a very valuable overview of the power of Git.
- Tim Berglund: Git From the Bits Up (55m) Advanced look at Git
- Matthew McCullogh: Advanced Git (1h 08m) Explaining the inner workings of Git (plumbing layer).

Overview

2013-10-29T20:07:00+01:00

These are books (some free on the web, some not), that every programmer and computer engineer should know (at least). I try to keep most of them in my L1 cache (i.e. on my desk).

Hardware

Patterson, Hennessy: "Computer Organization and Design"
Hennessy, Patterson: "Computer Architecuture: A Quantitative Approach"

x86 Architecture

Intel 64 and IA-32 Architectures Software Developer Manuals

The reference for programming Intel's x86 processors on the operating system level.

Volume 1: Basic Architecture (important foundation)
Volume 2 (a, b): Instruction Set Reference (explains all assembly instructions in detail)
Volume 3 (a, b, c): System Programming (details about subsystems and features, such as cache, interrupts, power management, etc.)

Download link (if link is outdated, go to Intel's Website and search for "Software Developer Manual").

Intel 64 and IA-32 Architectures Optimization Reference Manual

Many details about the microarchitectures (Nehalem, Sandy Bridge, Haswell) and their specific behavior.

Download link (if link is outdated, go to Intel's Website and search for "Optimization Reference Manual").

Intel Chipsets

There are also manuals for Intel's chipsets (e.g. ICH 8/9/10), where their configuration is described (e.g. the details on the I/O APIC).

AMD64 Architecture Programmer's Manual

The reference for programming AMD's processors on the operating system level.

Volume 1: Application programming
Volume 2: System Programming
Volume 3: General Purpose and System Instructions (Assembler Reference)
Volume 4: 128-bit and 256 bit media instructions (SSE and AVX Reference)
Volume 5: 64-Bit Media and x87 Floating-Point Instructions (MMX and FPU Reference)

Download link (if link is outdated, go to AMD's Website and search for "Developer Guides and Manuals").

Also on that page:

Software Optimization Guide (with Details on specific processor versions)

Low-level Details for Optimization

This selections is biased for x86 systems and the PC architecture (which is also employed in most HPC systems).

Hager: "Introduction to High Performance Computing for Scientists and Engineers"
Fog: "Software Optimization Manuals"
Drepper: "What every programmer should know about memory" Link

Programming and Algorithms

Sedgewick

C Programming

Kerningham & Richie: "The C Programming Language"

Programming Tools

Version control with Subversion
- there are translations (e.g. German), but they are not up to date with the English version.
Git Book

thorough introduction and documentation to the distributed source code version control system Git
- also in other languages, e.g. German

Shell Programming and Bash

Powers, Peek, O'Reilly, and Loukides: "Unix Power Tools"

Python

Operating Systems

General basics on operating systems:

Tanenbaum: "Modern Operating Systems"
Stallings: "Operating Systems: Internals and Design Principles"
Silberschatz: "Operating System Concepts"

Linux Kernel

Bovet: "Understanding the Linux Kernel"
Love: "Linux Kernel Development"
Corbet: "Linux Device Drivers"

Linux Application Development

Kerrisk: "The Linux Programming Interface"
Rochkind: "Advanced UNIX Programming"

Special topics

Nichols, Buttlar, and Proulx Farrell: "Pthreads Programming"
Gallmeister: "Posix.4: Programming for the real world"

Parallel systems and concurrent programming

Herlihy: "The Art of Multiprocessor Programming"
Vajda: "Programming Many-Core Chips" (2011)

Real-Time Systems

Liu: "Real-Time Systems"
Burns: "Real-Time Systems and Programming Languages"

Multi-processors in Real-Time

Moyer: "Multicore Embedded Systems" (2013)
- Very thorough textbook about multi-processor systems in embedded and real-time applications
Domeika: "Software Development for Embedded Multi-core Systems" (2008)
- Subtitle: "A Practical Guide for Using Embedded Intel Architecture"
- concentrates on the x86 architecture for embedded systems

More free books

Wikibooks: "A Little C Primer"

Wikibooks: "C Programming"

Wikibooks: "LaTeX"

Wikibooks: "x86 Disassembly"

Annotating

2013-10-29T10:22:00+01:00

Two LaTeX packages to add comments to documents while working together with others.

Package `todonotes`

The todonotes package allows inserting text boxes on the margin of the page. They are linked to the position in the text, where the \todo{} is entered.

An example can be seen at WriteLatex.

Package `trackchanges`

This package displays notes at the page margins similar to the above mentioned. However, it not just collects to do's, but supports a history of changes.

How this can be used is best seen in an example at WriteLatex.

Quotes

2013-10-18T10:12:00+02:00

31 October 2014

"In the majority of cases, performance will be programmer bound" - Barker's Law (as in Mike Barker)

Nitsan Wakart, Twitter via Highscalability.com

18 October 2013

Single core systems are becoming a historic curiosity, we should justify every piece of extra complexity we add for them.

Ingo Molnar (via LWN Quote of the Week)

10 Januar 2013

A programmer had a problem. He thought to himself, "I know, I'll solve it with threads!". has Now problems. two he

Davidlohr Bueso (via LWN Quote of the Week)

Problems

2013-10-03T09:41:00+02:00

Will Haldean Brown wrote about The problem with Vim. This made me smile, because I know the same situations.

But one thing bugs me even more: Every vi user knows the repetitive task of opening a file, making some changes and then: save and close, ESC, :wq, ENTER. That sequence is so engraved in the muscles, we don't think about it.

But once in a while, after entering text in a GUI window, it happens when the intended action of store and exit makes the fingers hit the well known sequence of ESC, ... well, the other keys remain unheard by the canceled dialog window.

Okay, there are plugins for many applications to emulate vi keybindings, but there will be another dialog that has ESC hard wired with the cancel button.

Still not resolved this...

Endianess

2013-09-13T17:53:00+02:00

For multi-byte variables as int, it matters how this sequence of bytes is stored in the memory. The two possibilities are little endian and big endian.

But first, let's recap, how hexadecimal values are written and interpreted and how their bits are stored. The following variable i is of type integer and uses four bytes of storage:

int i = 0x12345678;

The digits are called nibble, each having a value of 0 to f (representing 15). In this example, the 8 is the least significant digit, its value is factored with $16^{0} = 1$ . The next digit from the right, 7, must be multiplied with $16^{1} = 16$ , thus it adds the value $7 \cdot 16 = 112$ . The other nibbles are handled accordingly up to the leftmost (8th) place (value 1), having a factor of $16^{7} = 268 435 456$ .

Each nibble can easily be converted to binary because there are only 16 different values. For example: 8 = 0b1000. When those are written piece by piece, the hex value can be converted to binary:

0x12345678
= 0x    1    2    3    4    5    6    7    8
= 0b 0001 0010 0011 0100 0101 0110 0111 1000

As in decimal numbers, the least significant bit is far to the right and the most significant bit is left. Bitfields are usually displayed with bit 0 (the least significant) to the right and with increasing bit positions to the left.

Little endian

Systems using little endian byte order store the least significant byte at the lowest address.

int i = 0x12345678;

As in decimal numbers, the rightmost digit (8) is the least significant and the leftmost digit (1) is the most significant. Two hexadecimal digits (two nibbles) are stored in one byte:

↑ large adresses	0x1003	12hex	most significant byte
	0x1002	34hex
	0x1001	56hex
↓ small adresses	0x1000	78hex	least significant byte

This becomes twisted, if multiple bytes are displayed in a row. If the bytes in the row are numbered increasing from right to left, the above sequence of digits can be recognized easily:

0x100f	0e	0d	0c	0b	0a	09	08	07	06	05	04	03	02	01	0x1000
00	00	00	00	00	00	00	00	00	00	00	00	12	34	56	78

But if the bytes are numbered (more intuitively) from left to right, the sequence of pairs is reversed:

0x1000	01	02	03	04	05	06	07	08	09	0a	0b	0c	0d	0e	0x100f
78	56	34	12	00	00	00	00	00	00	00	00	00	00	00	00

Note, that only the bytes (pairs of hexadecimal digits) are in a different sequence, each pair for itself remains with its less significant digit on the right side (similar to decimal numbers like 42). Some Hex-Viewers show pairs of bytes (short, words) -- when these are stored in little endian format and displayed from left to right, they show up: 5678 1234.

The first version appears preferable, at least for multi-byte integers. With character strings, this is different:

char str[] = "Hello world";

This is comparable to an array with the letter H in the first element, i.e. the lowest address. Placing this string behind the variable i and showing addresses increasing from right to left:

0x100f	0e	0d	0c	0b	0a	09	08	07	06	05	04	03	02	01	0x1000
'\0'	'd'	'l'	'r'	'o'	'w'	' '	'o'	'l'	'l'	'e'	'H'	12	34	56	78

In the case of strings, increasing addresses from left to right (as one reads english text) is favorable (by twisting the integer, again):

0x1000	01	02	03	04	05	06	07	08	09	0a	0b	0c	0d	0e	0x100f
78	56	34	12	'H'	'e'	'l'	'l'	'o'	' '	'w'	'o'	'r'	'l'	'd'	'\n'

Big endian

In big endian, the least significant byte is stored at the largest address:

↑ large adresses	0x1003	78hex	least significant byte
	0x1002	56hex
	0x1001	34hex
↓ small adresses	0x1000	12hex	most significant byte

In this byte order, addresses are usually displayed increasing from left to right as this allows to read the multi-byte integer as well as the string:

0x1000	.1	.2	.3	.4	.5	.6	.7	.8	.9	.a	.b	.c	.d	.e	0x100f
12	34	56	78	'H'	'e'	'l'	'l'	'o'	' '	'w'	'o'	'r'	'l'	'd'	'\n'

Network byte order

Documents, that are exchanged between systems and especially network transmissions should care for the byte order. In the internet protocols, a network byte order is defined (which is big endian). There are functions to convert network byte order to the host byte order:

#include <netinet/in.h>
unsigned long htonl(unsigned long hostlong)   // host to network, long (32 bit)
unsigned long ntohl(unsigned long netlong)    // network to host, long (32 bit)

Programming

When does a program need to care for endianess?

Of course, when exchanging data with other instances (other programs or the same program running on a different system) either via files or network, the byte order matters. Only if all systems use the same byte order (for example, all are x86 systems), it can be ignored.

The internet protocol (BSD sockets) libraries use network byte order and require the IP address to be converted with htonl().

In internal data structures, the byte order matters if unions or pointers are used to access portions of other variables. As long as only math operations and casts are used, it can be ignored:

union {
    uint32_t u32;
    uint8_t  u8[4];
} demo;

demo.u32 = 0x12345678;
/*
 * using the address, it depends on the byte ordering, what comes out.
 */
printf("lowest address: u8[0] = %hhx \n", demo.u8[0]);
printf("highest address: u8[3] = %hhx \n", demo.u8[3]);
/*
 * using math operations, the least significant byte can be masked or calculated
 * independently from the byte ordering.
 */
printf("least significant byte: %hhx \n", demo.u32 % 256);              // modulo 
printf("least significant byte: %hhx \n", demo.u32 & 0xff);             // bitwise AND
printf("most significant byte: %hhx \n", demo.u32 / (256 * 256 * 256)); // division
printf("most significant byte: %hhx \n", demo.u32 >> 24);               // bit shift

(download this code.)

Using external files

2013-09-12T15:06:00+02:00

Motivation

In my thesis, I have a number of citations of a single website and I don't like to have all of them in the literature (biblatex). First, I wrote a macro to format those citations uniformly:

% use: \lwn{article-no}{author}{title}{date}
\newcommand{\lwn}[4]{[LWN:#1]\footnote{#2: #3 (#4)
     \url{http://lwn.net/Articles/#1}}}

This allows to write:

In the merge window of Linux 3.12, several new features are
planned \lwn{565251}{Jonathan Corbet}{The 3.12 merge window opens}{September 5, 2013}...

This will put a [LWN:565251] in the text and creates a footnote with the name of the author, the title, date, and URL.

To create a listing of all such citations in the appendix, the benefit of creating a macro plays out: by just changing the macro, it is possible to automatically collect all citations.

Index

With the \index{} macro, it is possible to create entries to the *.idx file. Those could be postprocessed with a script to generate a piece of LaTeX that can be included in the document. But the entries will still occur in the index (or the file must be postprocessed to remove them).

I used this in a Beamer presentation for a LaTeX tutorial to collect a list of all packages and all TeXdocs for the handout. For the Beamer slides, the index was not used otherwise.

\makeindex                         
\newcommand{\package}[1]{\texttt{#1}\index{package!#1}}
\newcommand{\texdoc}[1]{\textsc{#1}\index{texdoc!#1}}

You can use \package{tikz}, see also \texdoc{latex2e}.

In the Makefile was added:

grep 'package!' slides.idx | sed -e 's/\\indexentry{package!\(.*\)}{.*}/\1/' |sort|uniq > packages.txt
grep 'texdoc!' slides.idx | sed -e 's/\\indexentry{texdoc!\(.*\)}{.*}/\1/' |sort|uniq > texdoc.txt

Writing to files

Latex allows to open a new file and write (or read) to (from) that file.

\newwrite\outputstream
\immediate\openout\outputstream=myfile.tmp
\immediate\write\outputstream{foo 1}
\immediate\write\outputstream{foo 2}
\immediate\write\outputstream{\string\textbf{foo 3}}
\immediate\closeout\outputstream

(Source of that example: Juanjo (latex-community)).

My new macro for the \lwn citation is now:

% use: \lwn{article-no}{author}{title}{date}
\newcommand{\lwn}[4]{[LWN:#1]\footnote{#2: #3 (#4) \url{http://lwn.net/Articles/#1}}\immediate\write\outputstream{#1;#2;#3;#4}}

% Open file at the begin of the document
\AtBeginDocument{%
    \newwrite\outputstream
    \immediate\openout\outputstream=lwncite.csv
}

% Close the file at the end of document
\AtEndDocument{%
    \immediate\closeout\outputstream
}

The macros \AtBeginDocument and \AtEndDocument register those pieces (opening and closing the file) for the beginning and end of the document.

The file is lwncite.csv containing lines with article number, author, title, and date, separated by semicolon. In the Makefile, this is sorted and reformated:

sort lwncite.csv | awk -F';' '{print "[LWN:" $1 "] -- " $2 ". ``" $3 "''. http://lwn.net/Articles/" $1 " (" $4 ")"}' > lwncite.txt

This will be changed from text format to LaTeX as soon as I integrate the file into my document.

Fix "No room for a new `\write`"

If the compilation breaks with the error message "No room for a new \write", the maximum number of 16 open files is exceeded. But help is simple: The Package morewrites can be loaded at the beginning of the document (before any other packages). It invades deeply into the guts of LaTeX, but it has helped me without further problems. Source

Using SQLite in C programs

2013-09-02T11:18:00+02:00

Intro

SQLite is a free local database with SQL interface. I've written an introduction for the basics.

For interfacing with C, SQLite offers two files: a C file (module) containing all functions and a header file. They can either be included with the own files and compiled in or used as a shared library. Distributions should ship a -devel package for this. The version 2 is deprecated and should be replaced with version 3.

The most important data structures are sqlite3 for the database connection (similar to a file handle, it is opened, used for reading and writing and finally closed) and sqlite3_stmt for the queries. Queries are prepared, executed ("step"ed) and finalized.

The Database connection

A database is stored in a single binary file. Temporary data can be held solely in memory by using the special file name :memory:. The latter allows using SQLite for managing data, e.g. sorting.

The following program opens a database and closes it again. The return value is checked and an error message is printed if something went wrong.

#include <stdio.h>      // printf
#include <sqlite3.h>    // SQLite header (from /usr/include)

int main()
{
    sqlite3 *db;        // database connection
    int rc;             // return code
    char *errmsg;       // pointer to an error string

    /*
     * open SQLite database file test.db
     * use ":memory:" to use an in-memory database
     */
    rc = sqlite3_open(":memory:", &db);
    if (rc != SQLITE_OK) {
        printf("ERROR opening SQLite DB in memory: %s\n", sqlite3_errmsg(db));
        goto out;
    }
    printf("opened SQLite handle successfully.\n");

    /* use the database... */

out:
    /*
     * close SQLite database
     */
    sqlite3_close(db);
    printf("database closed.\n");
}

This can be compiled using (provided, that sqlite3-devel is installed):

$gcc -O0 -g openclose.c  -lsqlite3 -o openclose

A Makefile for this could be (this assumes, that the above file is saved as openclose.c, that pkg-config and sqlite3-devel are installed and uses the build-in rules to create executables from C files):

CFLAGS=$(shell pkg-config --cflags sqlite3) -O0 -g
LDLIBS=$(shell pkg-config --libs sqlite3)

default : openclose

debug : 
    @echo CFLAGS = $(CFLAGS)
    @echo LDLIBS = $(LDLIBS)

The classic way

The basic working is always:

Prepare a query. This is usually a string (e.g. created with asprintf()) that can optionally contain placeholders for binding values. This generates a statement.
Execute the statement, either once (for creating tables, inserting, or updating) or until all resulting rows are read.
Free the ressources by finalizing the statement.

Executing a static query

The following example creates a table. The query is a static string. This piece of code can be placed between sqlite3_open() and sqlite3_close() in the above example.

sqlite3_stmt *stmt;                                                                         /* 1 */

sqlite3_prepare_v2(db, "CREATE TABLE demo (name TEXT, age INTEGER);", -1, &stmt, NULL);     /* 2 */

rc = sqlite3_step(stmt);                                                                    /* 3 */
if (rc != SQLITE_DONE) {
    printf("ERROR inserting data: %s\n", sqlite3_errmsg(db));
    goto out;
}

sqlite3_finalize(stmt);                                                                     /* 4 */

A variable for the statement is allocated.
The current version of the prepare function is sqlite3_prepare_v2().
- The first parameter is the database connection db.
- The second parameter is the query (UTF-8 string).
- The third parameter should be the length of the query but can be -1 for zero-terminated C strings.
- For stmt, the address must be given because a pointer is returned therein.
- The final parameter is not used here, please consult the documentation.
The function sqlite3_step executes the prepared statement. Here, no returning rows are expected, only the success is tested.
Free the ressources of the prepared statement.

Execute a dynamic query

The next example uses asprintf() to dynamically create a query. This can be done with simple and trusted input. For content provided by users, the binding of values should be used as shown below.

char *query = NULL;

asprintf(&query, "insert into demo (name, age) values ('%s', %d);", "Tom", 20);         /* 1 */

sqlite3_prepare_v2(db, query, strlen(query), &stmt, NULL);                              /* 2 */

rc = sqlite3_step(stmt);
if (rc != SQLITE_DONE) {
    printf("ERROR inserting data: %s\n", sqlite3_errmsg(db));
    goto out;
}

sqlite3_finalize(stmt);
free(query);                                                                            /* 3 */

The function asprintf() allocates a buffer large enough to hold the created string.
The statement is now prepared from the query string. The length can be given (it should be faster).
Don't forget to free the string allocated by asprintf().

Once again: if a user provides vile input (i.e. "SQL injection"), the database is in danger.

Binding values

The following example shows how to use place-holders and bind values to them.

sqlite3_prepare_v2(db, "insert into demo (name, age) values (?1, ?2);", -1, &stmt, NULL);       /* 1 */

sqlite3_bind_text(stmt, 1, "Susan", -1, SQLITE_STATIC);                                         /* 2 */
sqlite3_bind_int(stmt, 2, 21);                                                                  /* 3 */

rc = sqlite3_step(stmt); 
if (rc != SQLITE_DONE) {
    printf("ERROR inserting data: %s\n", sqlite3_errmsg(db));
    goto out;
}

sqlite3_finalize(stmt);

Instead of strings and integers, use ?1 etc. as place-holders. Others are possible, see the documentation.
Bind the string "Susan" to ?1. As this is a static string, tell SQLite not to free it. If it is an allocated string, a function can be given to free it after the statement is processed.
Bind an integer value (no freeing required).

The rest is as before.

Query data with the classic method

If the prepared statement returns resulting rows, the function sqlite3_step() returns the value SQLITE_ROW and can be called until no more data is available. The columns can be accessed by index:

sqlite3_prepare_v2(db, "select distinct name, age from demo where age > ? order by 2,1;", -1,
        &stmt, NULL);

sqlite3_bind_int(stmt, 1, 16);                                                                  /* 1 */

while ( (rc = sqlite3_step(stmt)) == SQLITE_ROW) {                                              /* 2 */
    printf("%s is %d years old\n", sqlite3_column_text(stmt, 0), sqlite3_column_int(stmt, 1));  /* 3 */
}

sqlite3_finalize(stmt);

The prepared query contains ?1 in the WHERE clause. Bind an integer value.
Loop while more data is available.
The data is accessed by column index. Strings are given as const char * and can be directly given to functions like printf().

One-Step Query Execution Interface

The convenience function sqlite3_exec() combines the above steps. However, the binding of values appears not to be implemented. For user input, use either very careful masking or the classic way. The following example shows how to execute two SQL queries in one call. Resulting data is given to a callback function.

char *errmsg;
int callback(void *arg, int argc, char **argv, char **colName) {                                /* 1 */
    int i;
    for(i=0; i<argc; i++){
        printf("%s = %s\t", colName[i], argv[i] ?  : "NULL");
    }
    printf("\n");
    return 0;
}

rc = sqlite3_exec(db,                                                                           /* 2 */
    "select count(*), avg(age) from demo; select distinct name, age from demo order by 1,2;", 
    callback, NULL, &errmsg);

if (errmsg != NULL) {                                                                           /* 3 */
    printf("Error in sqlite3_exec: %s\n", errmsg);
    sqlite3_free(errmsg);
}

Provide a callback function. It can be given arguments to distinguish multiple uses. The query results are provided similar to main() with argc and argv plus the names of the columns in a fourth string array. Here, the example function just prints key=value.
Execute one or multiple queries. The callback function and its arguments (here: NULL) and a char pointer for an error message are the parameters.
If an error occured, errmsg is allocated and non-NULL. In this case, it should be freed using the function sqlite2_free().

More examples

More examples are in this source code archive.

SQLite

2013-09-01T19:14:00+02:00

SQLite is a free database engine that can be build into programs. There is also a command line tool. The data is stored in local files. It is not intended for multi-user or as web-server back-end (but probably can manage these, too), but for easily interfacing from programs. It can be used from Shell scripts, C and C++ programs, Python programs and many more. Many Web-Toolkits (like Django) use it for quickly starting to develop without having to install a database server.

SQL

The SQL dialect is a bit limited, but for those knowing PostgreSQL, MySQL or Oracle, it is very easy to adapt. The most important difference is SQLite ignoring data types: it does not matter what type a column is, it holds all sorts of data.

CREATE TABLE demo (ID INTEGER PRIMARY KEY, name TEXT, age INTEGER);
INSERT INTO TABLE demo (name, age) VALUES ('Peter', 42);
UPDATE demo SET (age=41) WHERE name = 'Peter';
INSERT INTO demo (name, age) VALUES ('Tom', 20);
DELETE FROM demo WHERE age > 100;
SELECT * from demo;
SELECT name, age FROM demo WHERE age > 16 ORDER BY age;

If the first column is of the exact data type INTEGER PRIMARY KEY, it will be an alias for rowid. This is the only column, that holds only integer and automatically increments to a new value for every new row. Commands must be terminated by a semicolon, the case does not matter. Strings are given in single quotes.

The SQL dialect is documented on the web site.

Command line tool

The command line interface sqlite3 can be used to interactively control the database file or to check what your program has done. The file must be given on the command line.

$ sqlite3 test.db
sqlite> .schema         # shows the existing tables and their structure
sqlite> .mode column    # switch to a nicer display mode, csv is also possible
sqlite> .header on      # show column headers
sqlite> select * from demo;
sqlite> .quit

Use .help to show what interal commands are understood.

Global settings can be given in a file ~/.sqliterc, e.g.

.mode column
.header on

Interfacing from Shell scripts

A query can be given on the command line:

$ sqlite3 test.db "select * from demo;";

The format can be configured to columns (for viewing), csv (for importing data to spreadsheets), HTML and some more.

To avoid stumbling over global settings for human readability, scripts should always set the output format according.

X11 Clipboard

2013-08-23T20:07:00+02:00

There are two programs helping to handle both methods. They appear not to be installed by default (at least on OpenSUSE and Fedora). But the package managers should provide them both. One is xsel the other is xclip.

Technical background

Note, that there are two distinct mechanisms to copy and paste in X11: the selection and the clipbard. The selection is used by highlighting some text with the mouse and pasted with the middle button (which is, on most mice, the wheel). To be precise, there are primary and secondary selections, but the latter is hardly used. The clipboard uses Ctrl-C and Ctrl-V to copy and paste (or the menu entries below Edit).

If I understood correctly, the selection is from the X server. It is not stored in the server, but only mediated between source and target process. If the source process ends, the selection can no longer be pasted. Desktop environments have a clipboard daemon (xclipboard, Klipper on KDE) that handles the clipboard.

Both programs xsel and xclip keep running in the background to provide their selection content and exit when another application takes over.

xsel

Xsel is easier to use, because it detects if you're using it as input or ouput:

xsel < file     # reads the content of file into the primary selection
xsel > file     # writes the primary selection to the file

By default, the program uses the primary selection (that one with the middle mouse button). With the parameter -b (or --clipboard), the clipboard (Ctrl-C, Ctrl-V) is used. The secondary selection can be addressed with -s (--secondary). The content of primary and secondary selection can be exchanged with xsel -x. It's also possible to delete the current content of the selection with -c (only if the source is xsel itself) and -d (requests the source program to discard the selection). Input can also be appended to the current content with -a.

To keep the content after the program ends, use -k. According the the manual page, this requests the data from the current source process and stores it for pasting while running in the background.

There are options for the X11 interaction (display, etc.), please refer to the man page if you need them. For me, the programs works without.

xclip

The other program, xclip, is less convenient in day-to-day use. To get the content of selection or clipbaord, it requires the command line option -o. Further, the non-default secondary selection and clipboard are addressed with the long options -selection secondary and -selection clipboard (both can be abbreviated -se s and -se c).

With -l (--loops), it can be instructed to wait in the background only for a number of requests from other applications before terminating.

More Examples

Both should work as expected in shell pipes: they place their input in the selection/clipboard and print its content when called.

Examples for the selection

Read a file into the selection:

xsel file
xclip file

xsel < file
xclip < file

Put something in the selection using a pipe:

some command | xsel
some command | xclip

Append the output to the selection (xclip can't do that)

some command | xsel -a

Retreive the selection to the standard output

xsel
xclip -o

Redirect the selection to a file:

xsel > file
xclip -o > file

Append the selection to the file:

xsel >> file
xclip -o >> file

Examples for the clipboard

Put something in the clipboard (e.g. to paste it in a GUI via Ctrl-V):

some command | xsel -b
some command | xclip -selection clipboard

Retreive something from the clipboard (that was copied by Ctrl-C or Ctrl-X):

xsel -b
xclip -o -selection clipbaord

Network

It is even possible, to use them over X11-forwarding SSH sessions: A file read into xsel (or xclip) on a remote system can be retrieved locally and vice-versa:

erde$ ssh -X sonne
sonne$ xsel < .bashrc
sonne$ exit
erde$ xsel > .bashrc

Note for gvim

The selection is in the register "*, the clipboard in "+.

Sources: commandlinefu.com, xsel project page, xclip project page and the man pages.

Markdown

2013-08-23T09:06:00+02:00

My Vim installation recognizes the extension .md as Modula. The filetype (for syntax highlighting) can be changed via

:set filetype=markdown

In other files, a comment can be used to change this on a per-file basis. But Markdown does not support comments (or deactivated lines), so this setting would show up in the final document. But the following setting in .vimrc does the job:

au BufNewFile,BufRead *.md setlocal ft=markdown

Other useful settings:

imap <F5> <ESC>yypVr=o
imap <F6> <ESC>yypVr-o

Those two mappings set F5 and F6 to place equal-signs and dashes below the current line. It does so by copying yy and pasting p the current line, then marking the whole line V and replacing each character r=. To use this from input mode, it first issues ESC and finally o to continue writing in the line below. This helps a lot with the underlined type of headings.

SSH

2013-08-23T07:52:00+02:00

Basics

To log into a system with your current username, just call ssh hostname:

georg@erde:~$ ssh sonne
Passwort: ***********
georg@sonne~$

Note: the password will not be displayed (not even stars). If you have a different account on the other system, prepend your user name like in an e-mail address: ssh gw@sonne.

With the command line option -X (upper case X), the X11 session will be forwarded over the SSH connection. When starting a program with GUI, it will be displayed on your local screen and your input will be redirected to the programm running on the remote machine.

SSH can create a pair of public and private keys to log into systems without everytime entering the password. Additionally, this comes handy for scripts where you would never put your password in.

First, create a key pair:

$ ssh-keygen

It will ask for a name where to store the key files and for a passphrase. The phrase should be taken literally: this will be your master key, don't just provide 8 characters (on none). I will show how to use the ssh-agent, so this passphrase will only be entered once after login. See man ssh-keygen for more options.

Distributing the public key

The private key must be kept secret. By default, it is stored in the hidden directory ~/.ssh/ in the file id_rsa and only readable for the user. (SSH will reject using it if the file is readable for more than the owner.)

The public key has the extension .pub (id_rsa.pub). To be accepted, it must be appended to the file ~/.ssh/authorized_keys. If your home directories are synchronized (e.g. via NFS), just append it to the local file:

~$ cd .ssh
.ssh$ cat id_rsa.pub >> authorized_keys

Otherwise, the public key must be copied to the remote machine. The tool ssh-copy-id helps: just call it with the hostname (or name@hostname, if the account is named differently):

georg@erde~$ ssh-copy-id mond
Password: ***********
Now try loggin into the machine...
georg@erde~$ ssh mond
georg@mond~$

Using the SSH-Agent to avoid the passphrase

If you're following the steps above, so far, you have only replaced the (probably 8 character) password with a (according to my suggestion much longer) passphrase. But as promised, there's help:

$ ssh-agent
$ ssh-add
Passphrase: *************************
$ ssh sonne

The first line starts the SSH-Agent in the background. Then ssh-add registers your private keys with the agent. It will ask for the passphrase. From now on, every subsequent SSH will use the password-less (and passphrase-less) login. If your public key is not on the target system, SSH falls back to asking for the password.

With the command line option -A, you can even forward the agent to a remote machine and then login from there to the next machine without entering a password (of -phrase) again.

More configuration for more convenience

SSH can be configured with the file ~/.ssh/config. Some global settings are:

ForwardAgent yes
ForwardX11 yes

These settings globally activate the command line options -A and -X. Settings for specific target machines can be given:

Host work
    User gw
    Hostname mymachine.example.com

This would set an alias work for gw@mymachine.example.com.

LaTeX Links

2013-08-20T09:20:00+02:00

Typography

Book: Butterick’s Practical Typography

Tools

CTAN Package Catalogue, a comprehensive (not to say overwhelming) list of LaTeX packages. The detail pages tell, if the package is available in your LaTeX distribution and hold links to the documentation. The link goes to a load balancer that redirects to a mirror.
Writelatex.com, a collaborative online editor that even works on tablets and phones.

SQL plot

2013-08-19T21:31:00+02:00

Intro

For quickly processing benchmark data, I wrote a rather complex bash script that allows querying data from a SQLite database to create gnuplot graphics (2D, 3D and histograms). In retrospect, this could probably better have become a Python program, but at that time, I was learning Bash and was experienced with SQL after several years of a student job programming Oracle.

Contents:

Intro
Example
Documentation
Download

Example

A quick example: The benchmark was executed for a range of array sizes and a range of loading processes, each for reading and writing. It issued minimum, maximum and average values for array element access times. It ran for a full weekend and created 6400 files. A script processed the output files and created SQL insert statements, that are directly piped to sqlite3 to create a database file (run time: approx. 1 minute). Using sqlplot.sh, the session went approximately as follows:

$ sqlplot.sh data.db
Welcome to sqlplot 0.9 (14.6.2013)
Print help to get list of commands
sqlplot> desc data
CREATE TABLE data (
         id INTEGER PRIMARY KEY,
         load_method CHAR(2),
         load_range INTEGER,
         isol_method CHAR( 2),
         isol_range INTEGER,
         min NUMBER,
         avg NUMBER,
         max NUMBER);

> select load_method||'-'||isol_method, load_range, isol_range, max from data order by 1,2,3
load_method||'-'||isol_method  load_range  isol_range  max       
-----------------------------  ----------  ----------  ----------
r-r                            1024        1024        220       
r-r                            1024        4096        200       
r-r                            1024        16384       200       
r-r                            1024        32768       220       
r-r                            1024        65536       176       
r-r                            1024        131072      200       
r-r                            1024        262144      268       
r-r                            1024        524288      180       
r-r                            1024        1048576     204       
r-r                            1024        2097152     288       
...
rw-rw                          536870912   536870912   1213684   
> splot

The command splot creates a 3D plot. It uses the three rightmost columns as x, y and z values. Additional columns are used as keys for different data sets (here: 'r-r' etc.). The resulting picture looked like this:

Some further tweaking gnuplot's settings:

> set logscale xyz 2
> set ticslevel 0
> set xlabel 'load'
> set ylabel 'benchmark size'
> set title 'max latency'
> log2tics x 1024 4 536870912
> log2tics y 1024 4 536870912
> splot
> splot max.svg

The last line stores a SVG image that can easily be converted to PDF, e.g. using Inkscape. (I favored SVG over PDF export, because when I wrote sqlplot, the SVG export of gnuplot was much nicer than its PDF generation capabilities.)

Documentation

The output of the help command:

help [xxx] - help [topic xxx] 
exit       - quit program
desc       - list of tables or table description
select ... - SQL select statement
set ...    - set gnuplot options
unset ...  - remove gnuplot setting
show       - show current gnuplot settings
reset      - remove all gnuplot settings
plot       - xy plot of last query
hist       - histogram (bar plot) of last query
splot      - 3d plot of last query
load       - load *.sp script

Download

current version: sqlplot.sh

Beamer

2013-08-18T17:30:00+02:00

Beamer

The beamer package allows to build presentation slides as PDF. If you're friend with LaTeX (after writing a paper or thesis), it really easy, to create similar perfectly looking slides.

to be continued...

(until then, refer to the documentation texdoc beamer)

RWTH Beamer Theme

2013-08-18T17:30:00+02:00

I really like the Beamer package for building presentation slides. But my university only has a Powerpoint template. So I went ahead creating a similar looking beamer style.

Version	Changes
0.4	2013-08-18 initial publishing

RWTH Beamer Vorlagen

2013-08-18T17:30:00+02:00

Ich erstelle meine Präsentationsfolien mit LaTeX Beamer. Da die Uni jedoch nur Powerpoint-Vorlagen bereit stellt, habe ich eben selber einen Beamer-Stil erstellt, der das Aussehen der Vorlage möglichst genau nachahmt.

Version	Änderungen
0.4	2013-08-18 Erstveröffentlichung

http Server

2013-08-16T08:23:00+02:00

Just enter:

python -m SimpleHTTPServer

This will start a server in the current directory on port 8000. I would not expose it to the wider internet, but if you quickly want to transfer some files to another computer (when SSH is not an option...).

Thanks to commandlinefu.com (where you'll find a ton of other useful bits).

TexLive on OpenSuse

2013-08-16T08:23:00+02:00

Markus Kohm (the Author of the KOMA-Script document classes) has created an OpenSuse RPM that provides the dependencies for other packages. Unlike the original OpenSuse texlive package, it does not install a fixed texlive version (OpenSuse still ships 2012), but it offers a GUI to download and maintain a TexLive installation from their repositories. Similar to Linux distributions, TexLive is also a distribution with mirrored repositories and dependency-regarding package manager.

The page is in German, but here's how it works:

Install the TexLive package via 1-Click Installation (the blue button). During this process, Yast/Zypper will ask to remove hundreds of texlive packages from its own repository.
Re-login. Your account was added to the new group texlive which is allowed to run the tool and the new group becomes active only after a new login (can be checked with id).
Call texlive-config either as normal user (to install just for yourself) or as root (for a system-wide installation with symlinks in /usr/local/bin)
The GUI offers some settings:
- The installed TexLive versions are shown on the left and can be activated alternately.
- Install a new TexLive version
Before installing, more choices can be done
- If you're installing as normal user, make sure that you do not select "install to /usr/local/bin", this will fail and you have to restart the installation.
- Otherwise, when executing as root, you probably want to choose this setting.
After installing, you either need either to re-login or to source /etc/profile.d/zzz-texlive.sh. The latter command sets environment variables which must be set for every shell. When logging in, the scripts in /etc/profile.d are executed and all shells inherit their settings.

C Makefile

2013-08-15T22:12:00+02:00

Example Makefile:

MAIN=$(sh egrep -l '\<main\>.*\(' *.c)
CFILES=$(filter-out $(MAIN), $(sh ls *.c))
OBJS=$(MAIN:.c=.o) $(CFILES:.c=.o)
EXEC=$(MAIN:.c=)
CC=gcc
CFLAGS=-g -O2
LDFLAGS=
LDLIBS=

DEFAULT: $(EXEC)

$(EXEC) : $(OBJS)
    $(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS)

The variable MAIN is populated with egrep searching for a file that contains the string main (not the most stable algorithm...)
CFILES are all other files with the file extension .c.
OBJS are all C-files (from MAIN and CFILES) with the extension .c replaced with .o.
EXEC will become the name of the executable. You can change this according to your preference.
Conventional variables used by the standard rules: CC is the name of the C compiler, CFLAGS are given for compiling, LDFLAGS are given for linking and LDLIBS are libraries that need to be linked.
The first rule is the default when make is invoked without parameters. To avoid confusion and accidental overwriting, a striking DEFAULT-target is used that just tells make to build the executable.
The only rule is given to link the executable EXEC from all object files.
For the object files, we don't need to give a rule because make has standard rules for such common tasks.
The variables CC, CFLAGS, LDFLAGS, and LDLIBS are also used by built-in rules from GNU make (gmake) that can be analyzed by make -p.

LaTeX Makefile

2013-08-15T22:12:00+02:00

Example Makefile:

MAIN=$(sh grep -l '\documentclass' *.tex)
TEXFILES=$(filter-out $(MAIN), $(sh ls *.tex))
PDFLATEX=pdflatex
PDF=$(MAIN:.tex=.pdf)

DEFAULT: pdf

pdf : $(PDF)

$(PDF) : $(MAIN) $(TEXFILES)
    $(PDFLATEX) $(MAIN)
    $(PDFLATEX) $(MAIN)

The variable MAIN is automatically populated with grep searching for a file that contains the string \documentclass (hopefully finds the correct one...)
TEXFILES are all other files with the file extension .tex.
PDF will become the name of the generated PDF. It is taken from the MAIN file. You can change this according to your preference.
PDFLATEX is the program name to generate PDF from LaTeX input.
The first rule is the default when make is invoked without parameters. To avoid confusion and accidental overwriting, a striking DEFAULT-target is used that just tells make to build the PDF.
The rule for make pdf is propably only useful, if also rules for Postscript and DVI are implemented. For beginners: stick to using pdflatex to directly generate PDF files. The old way with DVI is obsolete (unless your printing service can only handle Postscript).
The PDF is generated from all .tex files. To get the references right, pdflatex is called twice.
If you're using an index or BibLaTeX/BibTeX, then append their invokation and add another two calls to pdflatex.
If the generation takes too long, add a quick target, that unconditionally calls pdflatex once. After fixing typos or only minor modifications, this often suffices.

::make quick : $(PDFLATEX) $(MAIN)

Important C Libraries

2013-08-15T21:14:00+02:00

If you're using these libraries for your projects, you propably need to install the -devel versions from your distribution's repository.

Contents:

libc - the Standard C Library
- math
- POSIX threads
Gnome universe
- glib
- GTK+
GNU Scientific Library

libc - the Standard C Library

The C library is part of the C standard and contains functions such as printf() and malloc(). It is linked with -lc, but that's default and you don't need to provide this parameter.

The most important (okay, for open source programmers) implementation is the GNU libc (glibc). Don't confuse the glibc with the glib (described further below).

math

The math library contains functions more advanced than a simple calculator (think scientific calculator, but see also the GNU Scientific Library, further below).

Unlike the standard C library, the math part must be linked explicitly with -lm.

POSIX threads

Threads are multiple paths of execution inside a process that share its address space. Long running tasks can be delegated to threads to avoid blocking the main process. On today's multi-processor systems, they can be executed concurrently to speed up things. But in any case, the synchronization, especially for access to shared resources, must be taken care of manually.

Link with -pthread or -lpthread.

Gnome universe

glib

This is a part of GTK+ providing basic functionality Gnome is based on.

gobject?

GTK+

The object-oriented widget library, that makes the Gnome GUI and many other programs.

GNU Scientific Library

t.b.d.

georg's blog

Articles

Vim Regular Expressions

Search multiple lines

Find the shortest possible match

Search for lines NOT containing certain words

Highlight output

Use

Change color

References

YAML with Python

YAML

Python

References

Vim Copy and Paste

Command mode commands

Special registers

Modern Bash Scripts

General

Small bits and pieces

CamelCase

Git Analyze

Identify a GitHub repository

Age of the repo

Simple information by git tools

Remote links

SVN connection

Last commit

Git logs

Inline Assembler

GCC

Extended Inline Assembly

References

Intel syntax

AT&T syntax

Git-Subversion

Import Subversion Repository

Parsing Command-Line Parameters

Basic

Getopt

Getopts

Bibgrep and Texgrep

bibgrep

texgrep

Vim Spell Checking

Build-In Spell Checker

Language Tool

Vim Gems

Insert date- and timestamp

Vim Configuration and Settings

Basics

Config files

Modelines

Further reading

Bash Debugger

Build-in Debugging Features

BashDB

See also

Colorful Prompts

Text color

Background color

Example script

Prompts

Prompts

Encoding

Shell - Terminal

Applications

Git

Configuration

Initialization

First steps

Upload to a server

The full power of git

Creating a server repository

Tips and Tricks

Amend to a commit

Backdated Branch

Stashing

Import from Subversion

Display current branch in the Bash prompt

Simple information by `git` tools

Package `todonotes`

Package `trackchanges`

Fix "No room for a new `\write`"