Skip to content

Feed of the Publications list #117

Closed
ballaschk opened this issue Jul 29, 2021 · 21 comments
Closed

Feed of the Publications list #117

ballaschk opened this issue Jul 29, 2021 · 21 comments
Labels
critical Critical feature we need as soon as possible

Comments

@ballaschk
Copy link
Collaborator

Hi Donald,

Can we have some kind of feed generated from the list that I could use in the Concerto signage system and transform with XSLT? Ideally, the fields (authors, title, source, group) would be marked up individually / not summarized into a single field and grouped as items. Then I could select and format them individually. I actually only need the top 3 items.

Just in case this violates some kind of RSS standard, it would be great if the fields could be merged as follows:
Authors <strong>Title</strong> <em>Source</em> Groups

(And if it helps, you could just strip all the fields in the backends of HTML/markup, since the fields are formatted in the exact same way anyway.)

At the moment, I manually add the entries to the Concerto screen, but once the RSS engine of Concerto is running properly, I'd like to automate this and it would be handy to have the feed ready to make it happen …

Best wishes,
Martin

@ballaschk ballaschk added the critical Critical feature we need as soon as possible label Jul 29, 2021
@donald
Copy link
Member

donald commented Aug 6, 2021

Would you prefer a RSS (or atom) feed or just a naked xml representation of the last three publications? Although RSS feed might look more standard, the RSS suggested attributes really don't apply (or we don't have tha data) like publication time or link to article or channel description. If you are going to parse/translate the xml by means of XSLT anyway, vanilla XML of the data might be enough and easier to understand and we don't need to invent metadata to make RSS reader happy.

Both variants are rather easy to implement. No difference in that regard.

@donald
Copy link
Member

donald commented Aug 6, 2021

I've got a prototype which produces items like

<?xml version="1.0" encoding="utf-8" ?>
<publication-list>
    <publication doi="10.1016/j.molcel.2021.06.026">
        <authors><p>Arnold M, Bressin A, Jasnovidova O, Meierhofer D, Mayer A.</p></authors>
        <title><p>A BRD4-mediated elongation control point primes transcribing RNA polymerase II for 3′-processing and termination.</p></title>
        <source><p><i>Mol Cell.</i></p></source>
        <groups><p>(Mayer Lab)</p></groups>
     </publication>
    [...]
</publication-list>

The <p>s are unwanted editor artifact and these <i>s should not be in the database field but in external style. I could migrate that (and make the rich text fields into text fields). However, we sometimes have additional markup like link targets or internal <i>s in the fields:

<title><p><a href="https://doi.org/10.1093/nar/gkab208">Conserved DNA sequence features underlie pervasive RNA polymerase pausing</a>.</p></title>
<title><p><a href="https://pubmed.ncbi.nlm.nih.gov/33639093/">Snapshots of native pre-50S ribosomes reveal a biogenesis factor network and evolutionary specialization</a>.  </p></title>
<title><p><a href="https://www.nature.com/articles/s41586-021-03208-9">Noncoding deletions identify <i>Maenli</i> lncRNA as a limb specific <i>En1</i> regulator</a>. </p></title>
<source><p><i>Cancers </i>11 (9) (2019)</p></source>
<authors><p> Alahmad, A., Paffrath, V., Clima, R., Busch, J. F., Rabien, A., Kilic, E., <i>et al.</i>, ..., Meierhofer, D.</p><</authors>

But am I right, that currently the CMS doesn't show historical entries anywhere, only the last 5 publications on the front page? Maybe we shouldn't be to concerned with old entries?

To guarantee that valid XML is produced, I'd need to validate and escape the strings from the database, because they might contain illegal characters, or invalid markup which messes up the dom tree. But then the xml would be less readable, if internal data was escaped with entity names or encapsulated in <![CDATA[...]]>. To be 100% correct, we might need to generate multiple <![CDATA[...]]> sections if the raw data contains ]]> :-). Sigh, I don't think, you can work with JSON?

Do we want to migrate the fields from rich text to plain text? Would be cleaner but in the short term extra work. If not: Do we want to guarantee valid (and secure) xml or do we just trust the CMS author not to do stupid thinks.... Hey, I guess that you.

Can we trust the doi format (so that we can make it into an attribute)? I can make it an element with free text content as well.

@ballaschk
Copy link
Collaborator Author

Naked XML should suffice, as the Concerto plugin states that it can parse "RSS and other XML feeds". And as you said, RSS would not really make sense for this type of data.

But then, the example given at the link above does not work in Concerto and gives me the error message "URL does not appear to be an RSS feed". My guess is that there is a problem on the other end (Concerto plugin) that Peter would have to have a look at since I don't understand the code. But it should work in principle and the XML feed is a prerequisite to make progress in that regard …

@donald
Copy link
Member

donald commented Aug 6, 2021

Prototype: https://intranet2.molgen.mpg.de/feed/last_publications

Can Concerto parse that?

@ballaschk
Copy link
Collaborator Author

Prototype exactly looks like something we need.

I could migrate that (and make the rich text fields into text fields). However, we sometimes have additional markup like link targets or internal <i>s in the fields:

Links in the field was just me being lazy and not bothering about the DOI. Turning it to plain text should be fine.

But am I right, that currently the CMS doesn't show historical entries anywhere, only the last 5 publications on the front page? Maybe we shouldn't be to concerned with old entries?

Correct.

To guarantee that valid XML is produced, I'd need to validate and escape the strings from the database, because they might contain illegal characters, or invalid markup which messes up the dom tree. […] Sigh, I don't think, you can work with JSON?

Hm, wouldn't it be enough if I promise to put only "legal" character into that text fields? JSON is not an option for concerto, I guess. This is also just a "internal" feed. So in case I mess up, nothing important blows up.

Can we trust the doi format (so that we can make it into an attribute)? I can make it an element with free text content as well.

This I don't understand. Isn't it a text-only element already?

@ballaschk
Copy link
Collaborator Author

ballaschk commented Aug 6, 2021

Prototype: https://intranet2.molgen.mpg.de/feed/last_publications
Can Concerto parse that?

It says "Unable to preview. feed could not be parsed" and "URL does not appear to be an RSS feed" but I am still working on the transform markup, maybe I messed this up. Unfortunately, I can't access the Concerto logs to see where it went wrong.

I will figure it out, I just found https://www.w3schools.com/xml/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog

@donald
Copy link
Member

donald commented Aug 6, 2021

Unfortunately, I can't access the Concerto logs to see where it went wrong.

less or tail -f of /project/signage/concerto/log/production.log should work for you now.

@ballaschk
Copy link
Collaborator Author

Your prototype file is perfect and I managed to make a valid XSLT command that produces exactly what I want, but Concerto won't parse it.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html> 
<body>
  <h1>Latest three papers</h1>
    <xsl:for-each select="publication-list/publication">
      <p>
        <xsl:value-of select="authors"/>&#160;
        <b><xsl:value-of select="title"/>&#160;</b> 
        <i><xsl:value-of select="source"/>&#160;</i> 
        <xsl:value-of select="groups"/>
      </p>
    </xsl:for-each>
<p><em>Please submit new papers to news@molgen.mpg.de!</em></p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

@donald
Copy link
Member

donald commented Aug 6, 2021

Hm, wouldn't it be enough if I promise to put only "legal" character into that text fields? JSON is not an option for concerto, I guess. This is also just a "internal" feed. So in case I mess up, nothing important blows up.

Just wanted to make sure that you know who is to blame if the foyer display explodes because of invalid markup :-)

Can we trust the doi format (so that we can make it into an attribute)? I can make it an element with free text content as well.
This I don't understand. Isn't it a text-only element already?

Yes. But xml has a restricted character set (no control characters). And in attributes, some legal characters ( <, >, & and the quote character, the xml itself is using) need to be escaped, which the prototype doesn't do yet. And the attributes data is subject to normalization (e.g. multiple and different whitespaces might be reduced to a single space). All this doesn't matter, if you put only a normal identifier into that field and not something like "><title><script src="https://123.456.7.8/do-something-nasty.js"> :-)

@ballaschk
Copy link
Collaborator Author

Ah ok. The worst thing that could happen is that the publication list falls apart if one of the admin-editors decides to put in something malicious. You never know, but I think this pretty improbable. :)

The logs don't tell me much, unfortunately. No error messages – according to it it rendered the preview just fine (which it didn't – it's empty), and it does not tell me why it won't accept my "submission" and it thinks the "URL does not appear to be an RSS feed" (which it isn't obviously, but then the plugin is supposed to handle non-rss XML just fine). I think we would have to enable "debugging logging" somewhere …

@donald
Copy link
Member

donald commented Aug 6, 2021

The feed is available on the main site now. https://intranet.molgen.mpg.de/feed/last_publications

@ballaschk
Copy link
Collaborator Author

Great, thank you. I think we are done here. Let's see if Peter can figure out where the problem is with the renderer :(

@donald
Copy link
Member

donald commented Aug 6, 2021

I've just set the loglevel to debug. Maybe you can get an idea now?

@ballaschk
Copy link
Collaborator Author

I don't have permissions to read the log file anymore …

@donald
Copy link
Member

donald commented Aug 6, 2021

Oops. Yes, it created a new file when I restarted, sorry. Should work again.

@ballaschk
Copy link
Collaborator Author

Sadly, it doesn't give me more info on the feed than "unable to fetch or parse feed - https://intranet.molgen.mpg.de/feed/last_publications, feed could not be parsed" and that's it.

Given that not even the example used in the plugin documentation seems to work, there is probably a problem deeper in the system and not something that I could fix by changing a setting. I asked on the Concerto forum if someone experienced a similar problem. Our Concerto installation is also some kind of beta version, so it may be not entirely unexpected to run in some issues.

@donald
Copy link
Member

donald commented Aug 12, 2021

Okay, in our signage installation there seems to be concerto_simple_rss-1.1 installed:

signage@pitti:/project/signage/concerto/vendor/bundle/ruby/2.5.0/gems/concerto_simple_rss-1.1

Sadly, https://github.com/concerto/concerto-simple-rss doesn't have a "1.1" release, just "1.0" (latest) and "1.2" (pre-release).

Anyway, neither the installed "1.1" version nor the "1.0" or the "1.2" release seem to support XML feeds (just RSS). The changes to support other XML feeds are on the master branch only:

Just look at https://github.com/concerto/concerto-simple-rss/tree/1.2 and the README.md no longer talks about XML feeds...

The different feed type modules ( https://github.com/concerto/concerto-simple-rss/tree/master/app/models/feeders ) don't even exist in 1.2 ( https://github.com/concerto/concerto-simple-rss/tree/1.2/app/models/feeders )

I don't feel confident enough with ruby, gems, rake, and our concerto installation to try to replace the 1.1 library with the work-in-progress library from githubs master branch.

So maybe, I try to create a RSS feed, too, and we see if this works?

@ballaschk
Copy link
Collaborator Author

What a mess! Sorry, I just didn't know that the documentation is crap …

Sounds like a good idea, let's try it this way. I guess there are only a few required extra fields and I will be able to reformat the whole thing with XLST either way.

Thank you!

@donald
Copy link
Member

donald commented Aug 13, 2021

I've applied #120 to the main site so https://intranet.molgen.mpg.de/feed/last_publications is now an RSS feed. Can you retry?

Strictly speaking, this is invalid RSS, because we have html elements in the title. In the long run this should be fixed on the Intranet site. But I don't think the ruby parser would reject the feed for that reason, so you can give it a try.

Note that our attribute (pl:doi) and elements (pl:authors, pl:title, pl:source, pl:groups ) are in their own XML namespace as required by RSS 2.0. Our elements also contain HTML elements which they shouldn't.

@ballaschk
Copy link
Collaborator Author

ballaschk commented Aug 13, 2021

Works great! It's live already: http://concerto.molgen.mpg.de/frontend/1?preview=true

Since I apply html afterwards, it should be fine to convert all the fields in our existing publication entries to text and strip them of HTML tags.

For future reference, here is the XSLT I used:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:pl="http://www.molgen.mpg.de/xml/publication_feed">
<xsl:template match="/rss/channel">
<html> 
<body>
  <h1>Latest three papers</h1>
   <xsl:for-each select="item">
    <p><xsl:value-of select="pl:authors"/>&#160;<b><xsl:value-of select="pl:title"/></b>&#160;<i><xsl:value-of select="pl:source"/></i>&#160;<xsl:value-of select="pl:groups"/></p>
   </xsl:for-each>
<p><em>Please submit new papers to news@molgen.mpg.de!</em></p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

@donald
Copy link
Member

donald commented Aug 13, 2021

http://concerto.molgen.mpg.de/frontend/1?preview=true

Wow, that looks so great. Its a pity that it is only displayed on the foyer screen ( who look at it?) and not published to a broader audience.

Sign in to join this conversation on GitHub.
Labels
critical Critical feature we need as soon as possible
Projects
None yet
Development

No branches or pull requests

2 participants