Feed of the Publications list #117

ballaschk · 2021-07-29T09:13:56Z

Hi Donald,

Can we have some kind of feed generated from the list that I could use in the Concerto signage system and transform with XSLT? Ideally, the fields (authors, title, source, group) would be marked up individually / not summarized into a single field and grouped as items. Then I could select and format them individually. I actually only need the top 3 items.

Just in case this violates some kind of RSS standard, it would be great if the fields could be merged as follows:
Authors Title Source Groups

(And if it helps, you could just strip all the fields in the backends of HTML/markup, since the fields are formatted in the exact same way anyway.)

At the moment, I manually add the entries to the Concerto screen, but once the RSS engine of Concerto is running properly, I'd like to automate this and it would be handy to have the feed ready to make it happen …

Best wishes,
Martin

The text was updated successfully, but these errors were encountered:

donald · 2021-08-06T04:01:14Z

Would you prefer a RSS (or atom) feed or just a naked xml representation of the last three publications? Although RSS feed might look more standard, the RSS suggested attributes really don't apply (or we don't have tha data) like publication time or link to article or channel description. If you are going to parse/translate the xml by means of XSLT anyway, vanilla XML of the data might be enough and easier to understand and we don't need to invent metadata to make RSS reader happy.

Both variants are rather easy to implement. No difference in that regard.

donald · 2021-08-06T08:15:51Z

I've got a prototype which produces items like

<?xml version="1.0" encoding="utf-8" ?>
<publication-list>
    <publication doi="10.1016/j.molcel.2021.06.026">
        <authors><p>Arnold M, Bressin A, Jasnovidova O, Meierhofer D, Mayer A.</p></authors>
        <title><p>A BRD4-mediated elongation control point primes transcribing RNA polymerase II for 3′-processing and termination.</p></title>
        <source><p><i>Mol Cell.</i></p></source>
        <groups><p>(Mayer Lab)</p></groups>
     </publication>
    [...]
</publication-list>

The s are unwanted editor artifact and these s should not be in the database field but in external style. I could migrate that (and make the rich text fields into text fields). However, we sometimes have additional markup like link targets or internal s in the fields:

<title><p><a href="https://doi.org/10.1093/nar/gkab208">Conserved DNA sequence features underlie pervasive RNA polymerase pausing</a>.</p></title>
<title><p><a href="https://pubmed.ncbi.nlm.nih.gov/33639093/">Snapshots of native pre-50S ribosomes reveal a biogenesis factor network and evolutionary specialization</a>.  </p></title>
<title><p><a href="https://www.nature.com/articles/s41586-021-03208-9">Noncoding deletions identify <i>Maenli</i> lncRNA as a limb specific <i>En1</i> regulator</a>. </p></title>
<source><p><i>Cancers </i>11 (9) (2019)</p></source>
<authors><p> Alahmad, A., Paffrath, V., Clima, R., Busch, J. F., Rabien, A., Kilic, E., <i>et al.</i>, ..., Meierhofer, D.</p><</authors>

But am I right, that currently the CMS doesn't show historical entries anywhere, only the last 5 publications on the front page? Maybe we shouldn't be to concerned with old entries?

To guarantee that valid XML is produced, I'd need to validate and escape the strings from the database, because they might contain illegal characters, or invalid markup which messes up the dom tree. But then the xml would be less readable, if internal data was escaped with entity names or encapsulated in <![CDATA[...]]>. To be 100% correct, we might need to generate multiple <![CDATA[...]]> sections if the raw data contains ]]> :-). Sigh, I don't think, you can work with JSON?

Do we want to migrate the fields from rich text to plain text? Would be cleaner but in the short term extra work. If not: Do we want to guarantee valid (and secure) xml or do we just trust the CMS author not to do stupid thinks.... Hey, I guess that you.

Can we trust the doi format (so that we can make it into an attribute)? I can make it an element with free text content as well.

ballaschk · 2021-08-06T08:16:45Z

Naked XML should suffice, as the Concerto plugin states that it can parse "RSS and other XML feeds". And as you said, RSS would not really make sense for this type of data.

But then, the example given at the link above does not work in Concerto and gives me the error message "URL does not appear to be an RSS feed". My guess is that there is a problem on the other end (Concerto plugin) that Peter would have to have a look at since I don't understand the code. But it should work in principle and the XML feed is a prerequisite to make progress in that regard …

donald · 2021-08-06T08:41:36Z

Prototype: https://intranet2.molgen.mpg.de/feed/last_publications

Can Concerto parse that?

ballaschk · 2021-08-06T09:04:09Z

Prototype exactly looks like something we need.

I could migrate that (and make the rich text fields into text fields). However, we sometimes have additional markup like link targets or internal s in the fields:

Links in the field was just me being lazy and not bothering about the DOI. Turning it to plain text should be fine.

But am I right, that currently the CMS doesn't show historical entries anywhere, only the last 5 publications on the front page? Maybe we shouldn't be to concerned with old entries?

Correct.

To guarantee that valid XML is produced, I'd need to validate and escape the strings from the database, because they might contain illegal characters, or invalid markup which messes up the dom tree. […] Sigh, I don't think, you can work with JSON?

Hm, wouldn't it be enough if I promise to put only "legal" character into that text fields? JSON is not an option for concerto, I guess. This is also just a "internal" feed. So in case I mess up, nothing important blows up.

Can we trust the doi format (so that we can make it into an attribute)? I can make it an element with free text content as well.

This I don't understand. Isn't it a text-only element already?

ballaschk · 2021-08-06T09:10:27Z

Prototype: https://intranet2.molgen.mpg.de/feed/last_publications
Can Concerto parse that?

It says "Unable to preview. feed could not be parsed" and "URL does not appear to be an RSS feed" but I am still working on the transform markup, maybe I messed this up. Unfortunately, I can't access the Concerto logs to see where it went wrong.

I will figure it out, I just found https://www.w3schools.com/xml/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog

donald · 2021-08-06T09:57:59Z

Unfortunately, I can't access the Concerto logs to see where it went wrong.

less or tail -f of /project/signage/concerto/log/production.log should work for you now.

ballaschk · 2021-08-06T09:58:59Z

Your prototype file is perfect and I managed to make a valid XSLT command that produces exactly what I want, but Concerto won't parse it.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html> 
<body>
  <h1>Latest three papers</h1>
    <xsl:for-each select="publication-list/publication">
      <p>
        <xsl:value-of select="authors"/>&#160;
        <b><xsl:value-of select="title"/>&#160;</b> 
        <i><xsl:value-of select="source"/>&#160;</i> 
        <xsl:value-of select="groups"/>
      </p>
    </xsl:for-each>
<p><em>Please submit new papers to news@molgen.mpg.de!</em></p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

donald · 2021-08-06T10:19:35Z

Hm, wouldn't it be enough if I promise to put only "legal" character into that text fields? JSON is not an option for concerto, I guess. This is also just a "internal" feed. So in case I mess up, nothing important blows up.

Just wanted to make sure that you know who is to blame if the foyer display explodes because of invalid markup :-)

Can we trust the doi format (so that we can make it into an attribute)? I can make it an element with free text content as well.
This I don't understand. Isn't it a text-only element already?

Yes. But xml has a restricted character set (no control characters). And in attributes, some legal characters ( <, >, & and the quote character, the xml itself is using) need to be escaped, which the prototype doesn't do yet. And the attributes data is subject to normalization (e.g. multiple and different whitespaces might be reduced to a single space). All this doesn't matter, if you put only a normal identifier into that field and not something like "><title><script src="https://123.456.7.8/do-something-nasty.js"> :-)

ballaschk · 2021-08-06T10:43:58Z

Ah ok. The worst thing that could happen is that the publication list falls apart if one of the admin-editors decides to put in something malicious. You never know, but I think this pretty improbable. :)

The logs don't tell me much, unfortunately. No error messages – according to it it rendered the preview just fine (which it didn't – it's empty), and it does not tell me why it won't accept my "submission" and it thinks the "URL does not appear to be an RSS feed" (which it isn't obviously, but then the plugin is supposed to handle non-rss XML just fine). I think we would have to enable "debugging logging" somewhere …

donald · 2021-08-06T10:46:54Z

The feed is available on the main site now. https://intranet.molgen.mpg.de/feed/last_publications

ballaschk · 2021-08-06T10:49:28Z

Great, thank you. I think we are done here. Let's see if Peter can figure out where the problem is with the renderer :(

donald · 2021-08-06T11:06:25Z

I've just set the loglevel to debug. Maybe you can get an idea now?

ballaschk · 2021-08-06T11:08:32Z

I don't have permissions to read the log file anymore …

donald · 2021-08-06T12:51:58Z

Oops. Yes, it created a new file when I restarted, sorry. Should work again.

ballaschk · 2021-08-06T14:18:48Z

Sadly, it doesn't give me more info on the feed than "unable to fetch or parse feed - https://intranet.molgen.mpg.de/feed/last_publications, feed could not be parsed" and that's it.

Given that not even the example used in the plugin documentation seems to work, there is probably a problem deeper in the system and not something that I could fix by changing a setting. I asked on the Concerto forum if someone experienced a similar problem. Our Concerto installation is also some kind of beta version, so it may be not entirely unexpected to run in some issues.

donald · 2021-08-12T11:44:59Z

Okay, in our signage installation there seems to be concerto_simple_rss-1.1 installed:

signage@pitti:/project/signage/concerto/vendor/bundle/ruby/2.5.0/gems/concerto_simple_rss-1.1

Sadly, https://github.com/concerto/concerto-simple-rss doesn't have a "1.1" release, just "1.0" (latest) and "1.2" (pre-release).

Anyway, neither the installed "1.1" version nor the "1.0" or the "1.2" release seem to support XML feeds (just RSS). The changes to support other XML feeds are on the master branch only:

Just look at https://github.com/concerto/concerto-simple-rss/tree/1.2 and the README.md no longer talks about XML feeds...

The different feed type modules ( https://github.com/concerto/concerto-simple-rss/tree/master/app/models/feeders ) don't even exist in 1.2 ( https://github.com/concerto/concerto-simple-rss/tree/1.2/app/models/feeders )

I don't feel confident enough with ruby, gems, rake, and our concerto installation to try to replace the 1.1 library with the work-in-progress library from githubs master branch.

So maybe, I try to create a RSS feed, too, and we see if this works?

ballaschk · 2021-08-12T14:04:15Z

What a mess! Sorry, I just didn't know that the documentation is crap …

Sounds like a good idea, let's try it this way. I guess there are only a few required extra fields and I will be able to reformat the whole thing with XLST either way.

Thank you!

donald · 2021-08-13T07:03:20Z

I've applied #120 to the main site so https://intranet.molgen.mpg.de/feed/last_publications is now an RSS feed. Can you retry?

Strictly speaking, this is invalid RSS, because we have html elements in the title. In the long run this should be fixed on the Intranet site. But I don't think the ruby parser would reject the feed for that reason, so you can give it a try.

Note that our attribute (pl:doi) and elements (pl:authors, pl:title, pl:source, pl:groups ) are in their own XML namespace as required by RSS 2.0. Our elements also contain HTML elements which they shouldn't.

ballaschk · 2021-08-13T09:34:04Z

Works great! It's live already: http://concerto.molgen.mpg.de/frontend/1?preview=true

Since I apply html afterwards, it should be fine to convert all the fields in our existing publication entries to text and strip them of HTML tags.

For future reference, here is the XSLT I used:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:pl="http://www.molgen.mpg.de/xml/publication_feed">
<xsl:template match="/rss/channel">
<html> 
<body>
  <h1>Latest three papers</h1>
   <xsl:for-each select="item">
    <p><xsl:value-of select="pl:authors"/>&#160;<b><xsl:value-of select="pl:title"/></b>&#160;<i><xsl:value-of select="pl:source"/></i>&#160;<xsl:value-of select="pl:groups"/></p>
   </xsl:for-each>
<p><em>Please submit new papers to news@molgen.mpg.de!</em></p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

donald · 2021-08-13T10:50:06Z

http://concerto.molgen.mpg.de/frontend/1?preview=true

Wow, that looks so great. Its a pity that it is only displayed on the foyer screen ( who look at it?) and not published to a broader audience.

ballaschk added the critical Critical feature we need as soon as possible label Jul 29, 2021

donald mentioned this issue Aug 6, 2021

publications: Add XML feed for last publications #119

Merged

donald mentioned this issue Aug 13, 2021

Publications feed2 #120

Merged

ballaschk closed this as completed Aug 13, 2021

donald mentioned this issue Aug 13, 2021

Data of publication items should be text not RichText #121

Open

Feed of the Publications list #117

Feed of the Publications list #117

ballaschk commented Jul 29, 2021

donald commented Aug 6, 2021 •

edited

Loading

donald commented Aug 6, 2021 •

edited

Loading

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021 •

edited

Loading

ballaschk commented Aug 6, 2021

ballaschk commented Aug 6, 2021 •

edited

Loading

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021 •

edited

Loading

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 12, 2021

ballaschk commented Aug 12, 2021

donald commented Aug 13, 2021

ballaschk commented Aug 13, 2021 •

edited

Loading

donald commented Aug 13, 2021

Feed of the Publications list #117

Feed of the Publications list #117

Comments

ballaschk commented Jul 29, 2021

donald commented Aug 6, 2021 • edited Loading

donald commented Aug 6, 2021 • edited Loading

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021 • edited Loading

ballaschk commented Aug 6, 2021

ballaschk commented Aug 6, 2021 • edited Loading

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021 • edited Loading

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 6, 2021

ballaschk commented Aug 6, 2021

donald commented Aug 12, 2021

ballaschk commented Aug 12, 2021

donald commented Aug 13, 2021

ballaschk commented Aug 13, 2021 • edited Loading

donald commented Aug 13, 2021

donald commented Aug 6, 2021 •

edited

Loading

donald commented Aug 6, 2021 •

edited

Loading

donald commented Aug 6, 2021 •

edited

Loading

ballaschk commented Aug 6, 2021 •

edited

Loading

donald commented Aug 6, 2021 •

edited

Loading

ballaschk commented Aug 13, 2021 •

edited

Loading