Skip to content

Check trust #130

Merged
merged 5 commits into from
Jul 10, 2020
Merged

Check trust #130

merged 5 commits into from
Jul 10, 2020

Conversation

donald
Copy link
Collaborator

@donald donald commented Jul 7, 2020

Show alert on lightdm greeter (login screen) and getty prompt if sytstem lost trust (was removed from "amd" group).

Requires mariux64/bee-files#1845
Addresses mariux64/bee-files#1842

Use sudo systemctl enable getty-checktrust after intallation.

Copy link
Contributor

@pmenzel pmenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Using clusterd sounds like overkill though.

Wouldn’t trying to access a directory like /scratch/tmp be enough?

@@ -0,0 +1,27 @@
#! /usr/bin/bash

(while true; do xdotool search --sync --name bla windowraise; sleep 1; done) &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extend the commit message (or add a comment here), what xdotool is needed for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The greeter and potentially the dialog box are two windows on a X11 screen without a window manager. They run in parallel and the order they realize their windows is random. So the stacking order is random, too. Typically the greeter needs a longer time to realize its window, which is fullscreen, and would completely cover the error dialog.

xdotool waits until the dialog window is realized and than raises it above the login window. Because the dialog is small and centered, both windows are then visible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under normal circumstances this would not need to be a loop, because only the first dialog is affected. If the user clicks "OK", the conditions are reevaluated and typically another dialog will rosen. But this time it is after the greeter window and thereby on top by itself. Without a window manager, users can't change the layout. But who knows, people are creative. Perhaps user is asleep with head on "Return" key and is able to do dismiss more than one error dialog before the greeter has finished its setup. Or maybe when you enter a username with more than 127 funny unicode-Characters, the login dialog segfaults but has an internal signal handler to restart from scratch....

@donald
Copy link
Collaborator Author

donald commented Jul 7, 2020

Wouldn’t trying to access a directory like /scratch/tmp be enough?

There are a million other reasons, why access to /scratch/tmp might fail. Maybe ""moep" is down or has problems with dead processors, maybe automount is not yet started or someone missedited the map, maybe the network cable not plugged, ....

Using clusterd sounds like overkill though.

In what regard? I'm sure an automount of a remote nfs filesystem consumes a lot more ressources on client and server and network than a single short reply over tcp. netcat -w 1 $host 236 </dev/null : 8 packets. ls -ld /scratch/tmp : 105 packets plus 19 to dismount after the timeout. :-)

@donald
Copy link
Collaborator Author

donald commented Jul 8, 2020

"consumes a lot more ressources" "and server"

I take that back. Although nfs hat more complexity, it is handled in-kernel. clusterd needs a fork and several system calls.

Another problem that came to my mind with the test-mounting alternative is that the timeouts are rather long, undefined and unlimited.

@donald
Copy link
Collaborator Author

donald commented Jul 8, 2020

Perhaps we should retry forever until we get a verdict (network cable not plugged in or all systems/network equipment restarting after power drop)

@donald
Copy link
Collaborator Author

donald commented Jul 9, 2020

I clean this up bit.

@pmenzel
Copy link
Contributor

pmenzel commented Jul 9, 2020

So, /node/issue.d is for storing issues? What other issues could there be?

Add a very simple tcp service on port 236 to clusterd which can be used
by other hosts to query, if they are still trusted.

clusterd replies with either "I trust you\n" or "I don't trust you\n"
depending on whether the connecting host has the amd hostconfig flag
or not. After sending the message, clusterd will hang up.
Add a script to determine whether the system has lost the trust of other
systems. Query a few remote systems which are supposed to be online most
of the time.

Note, that this script has a tristate result (trusted, not trusted,
unknown) so we don't communicate the result via exit status, but output
"trusted", "not trusted" or nothing.
Install three new files into the system:

- /etc/xdg/lightdm/lightdm.conf.d/50-use-wrapper.conf
- /usr/libexec/lightdm-greeter-wrapper
- /usr/libexec/lightdm-show-trust-warning

The first file adds a configuration option to lightdm to invoke the
greeter via a wrapper. The second file is the wrapper script, which
forks of the third script before exec-ing into the greeter.

The third script uses /usr/sbin/trustcheck to find out whether we lost
trust of the other nodes.  If it gets  a negative verdict, it shows a
dialog on top of the login screen to alert the user about the condition.
If it doesn't get a verdict, it keeps asking (e.g. when the network is
not plugged in).

xdotool is used to raise the dialog above the (full screen) login
window. This has to be done in a loop, because we don't know how long
the login windows needs to appear and pop up in front of the dialog.
Create a service "checktrust" which is run before getty is started. If
this service detects that the system has lost trust, a warning message
is dropped into /node/issue.d/notrust.issue.

Create a symlink for agetty in /etc/issue.d to the (only possibly
existing) file in the /node path. agetty shows this message before
the login prompt.

checktrust-for-getty: Use checktrust command
@donald
Copy link
Collaborator Author

donald commented Jul 9, 2020

Jul 09 13:39:45 sigusr2.molgen.mpg.de systemd[1]: Starting Check Mariux64 trust for getty...
Jul 09 13:39:45 sigusr2.molgen.mpg.de getty-checktrust[327]: afk: forward host lookup failed: Host name lookup failure : Resource temporarily unavailable
Jul 09 13:39:51 sigusr2.molgen.mpg.de systemd[1]: Started Check Mariux64 trust for getty.

This might be the same thing as what is happening to exportfs (mariux64/bee-files#1841).
Its reproducible: The error goes away when 127.0.0.1 is removed from /etc/resolve.conf, otherwise this will happen every time. After=unbound.service doesn't help.

Also doesn't happen, when afk is put into /etc/hosts. This is only relevant during testing now, because currently only afk hat the modified clusterd. When this is merged, every system will have the modified clusterd and wtf already is in /etc/hosts (for whatever reason).

@donald
Copy link
Collaborator Author

donald commented Jul 9, 2020

So, /node/issue.d is for storing issues? What other issues could there be?

We invented "/node" is for things we need in the filesystem but are different from node to node. Now we also have "/etc/local" for static configuration, so "/node" is left for things, which are dynamically generated. These things could now be in /var/run as well.

"/etc/issue.d" is from agetty and "/node/issue.d" augments it. The exact semantic ("what other issues could be there") is defined by agetty. Because we want one of the files to be dynamic (computetd during boot or later, not disted) we dist the symlink to "/node/" in "/etc".

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants