Hacker Newsnew | past | comments | ask | show | jobs | submit | mentatseb's commentslogin

The tech story by NASA's chief knowledge architect is more detailed on https://linkurio.us/blog/how-nasa-experiments-with-knowledge... and https://neo4j.com/blog/nasa-critical-data-knowledge-graph/, with a presentation video on https://www.youtube.com/watch?v=vwJyU9vsfmU

Disclamer: Linkurious CEO here, the tool used to explore the Neo4j graph database used at NASA.


Since linkcurious is for the enterprise, What would be your recommendations for a personal knowledge database for individual users ?


Not your parent, but I've recently toyed with TiddlyWiki, which shows promise, but requires heavy configuration.

There is also an extension to it called TiddlyMap which displays several of the properties mentioned in this article (edges with properties, etc), but again, requires configuration to get just so.

If you're game to do some tinkering, I've found it to be hackable to some very deep levels. Another nicety is that it's all just a single HTML file, so it's madly portable (I can use the same site on my phone and laptop).

All this being said, there is a growing list of features that I would like to see in Tiddlywiki that I'm not sure I can hack in myself, so I suppose I, too, am looking for the "one true knowledge management" solution


Just start with Apache Jena, standards based with RDF as the exchange format and SPARQL for the query language. Others solutions may use proprietary stuff for better vendor lock-in: This is completely up to you if you want that. But with Apache Jena you can change later to other KG databases. Also: Apache Jena is easy to work with, since it includes Fuseki to start directly using it as a web API.

https://jena.apache.org/ https://jena.apache.org/documentation/fuseki2/

Once you need "big data" for your personal Knowledge Graph, you can use other RDF stores, without vendor lock-in.


There is TheBrain, which has a free-of-charge personal offerring, as well as aid service tiers.

Jerry Michalsky is among the more notable users.

https://www.jerrysbrain.com

http://old.thebrain.com/store/faq/#Section1.1


I've been working on building a personal knowledge database tool recently, feel free to shoot me an email at antimatter15@gmail.com if you'd like to be one of the first to try it out.


Due to the number of crawlers on this site, I recommend (if it's not too late) you edit your post to use the format

address at domain dot com :)

PS: Sent you an email. :)


if you are after a graph database for personal use - Segrada (Segrada.org) is a nice open source UI on top of OrientDB.

Otherwise see Marviel and DredMorbius's suggestions both are worth checking out.


Segrada looks very interesting, thank you very much !


It depends on your needs, maybe try the SaaS app https://kumu.io/ or https://graphcommons.com/


"Chief Knowledge Officer", cool title.


A know a guy who's title is "President of Intelligence".


It's hard to beat a NASA job title, "Planetary Protection Officer". [1]

[1] https://sma.nasa.gov/sma-disciplines/planetary-protection


a.k.a a librarian.


Using a librarian to manage knowledge in an organized organization sounds like a no brainer. They are trained for just that stuff!


I found your product this week when searching for a neo4j visualization tool but I couldn’t try it on anything other than an example database. Is there anyway to try/use it as a researcher?


Amazing, thanks for sharing!


Hi, I'm Gephi+Linkurious co-founder. I've found visualizing large graphs pretty useless beyond the "I see meatballs!" effect and my opinion, after a decade in the field, is that it's the wrong problem for data analytics.

Much more interesting information is discovered during the process of dynamically building a visualization that is focused on user questions. I see with Linkurious that investigators usually need to visualize less than 1000 edges of a 1M+ edges graph to get answers.


The ultimate answer is generally a small graph: Graphistry is a tool that helps you get there. Why that's hard is most Splunk, Spark, etc. queries will return a bunch of events, and each event has a bunch of metadata. A tool should help, not fall over.

I think you're referring to scenarios closer to why we created the visual playbook concept and our embedding APIs. Small visualizations are often a good starting point in investigative scenarios. Even better.. no visualization, just full automation. We find this thinking comes up when the investigative flow is more established and curated. With visual playbooks, teams can record & automate multistep flows, run them whenever an incident happens, take action, and share & document the results. If part of the incident involves a bunch of events, or the analysts wants to dig in, our stack won't fall over. Instead, it provides a full visual analytics session with multiple cross-linked data views.

And we're fans of Gephi. We GPU accelerated the core algorithm -- we may be coming from a different perspective and user base.


Yup, it's important that people understand the role of visualization in the complete data chain.


I'm not sure I understand. Is there a resource that explains the role of visualizations in data flows in the context explained here?


It's basically applied actor-network theory from sociology and the delegation of control to objects https://en.wikipedia.org/wiki/Actor%E2%80%93network_theory


The conclusion is misleading due to 2 wrong assumptions:

1. The population is heterogeneous: interviews test different skills. All interviews don't test the same set of skills, which is mandatory to compare interview scores because scores are aggregates of these skill tests. Different job opportunities means different skills to test, so it seems reasonable to assume that people evaluation vary for different job opportunities, and thus their scores vary for different interviews.

2. The observations are not statistically independent: past interviews may influence future interviews. People may get better at passing interviews or conducting interviews over time. This would impact their score. It would be good to study the evolution of individual scores over time.

While (1) should strongly limit the conclusions of the study, the complete analysis may simply be irrelevant because of (2) if the statistical independence of observations is not demonstrated. Sorry guys but this is Statistics 101 introductory course.


(1) We listened to most interviews on the platform to establish homogeneity. Interviews were across the board, language agnostic, and primarily algorithmic in nature.

(2) We actually looked into this and noticed that time didn't really affect performance. Usually, people did their interviews over a pretty short time span and then found a job. Or, people were already experienced interviewers and had kind of hit a plateau. You can see the raw data and how it oscillates wrt time in the footnotes.


Watch this conference talk about how to choose a graph visualization library: https://www.youtube.com/watch?v=7BPbaApIOrc

Disclamer: talk given by an engineer who has worked on the sigma.js fork https://github.com/Linkurious/linkurious.js


Technical side: we recommend to display up to 2000 nodes and edges. Laptops < 2 year old can display and layout graphs up to 4000 nodes and edges but with stability issues.

Cognitive side: we recommend to hide nodes and edges as soon as you don't need them. One cannot ask the same class of questions to graph visualizations of very different sizes, see slide 19 on http://www.slideshare.net/Cloud/sp1-exploratory-network-anal...


The biggest dataset used by one of our users is a genetic graph of 240 millions of nodes and edges using a single server. Linkurious will take a few hours to index the complete dataset. From then, the search engine delivers instant results with autocomplete, fuzziness and advanced query options; graph exploration queries take less than a second to complete (sometimes a bit more depending on the web client). We are still working on improving our data indexing strategy to gain performance.

Synerscope has a strong approach to data analysis, and Danny Holten is well-known in the infovis community. I don't think that they provide a search engine thought, you have probably more information on their product.


Thanks! Worth also mentioning Sigma.js, the open source core on which we are building our toolkit.


Yeah, building a great graph engine is hard!

I like a lot about:

-- Cola (flexible constraint-based): http://marvl.infotech.monash.edu/webcola/

-- Vivagraph (big graphs in webgl) doesn't get enough love: http://www.yasiv.com/graphs#Bai/rw496

Getting the best of both is hard, hence our real-time GPU clusters. It'll be longer before we can usefully open that up that layer =/


Cola has great algorithms! Hopefully they will be implemented as a Sigma plugin (https://github.com/qinfchen/sigmajs-webcola).

Still, GPU clusters beat all for performance. I'd love to see a demo :)


Thanks, our backend is based on node.js so we're considering to support the Rexster server (https://github.com/thinkaurelius/titan/wiki/Rexster-Graph-Se...), which provides a REST API to Gremlin. What do you think about it?

We're also eager to see any binary protocole to communicate with graph databases.


Licenses come with a 1-month money back guarantee. Drop me an email to seb@linkurio.us to discuss your project.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: